Python Machine Learning Tutorial #12 - Implementing K-Means Clustering
Key Takeaways
Implements K-Means Clustering using Python and scikit-learn
Full Transcript
hey guys and welcome back to the machine learning tutorial with Python so in today's video we're going to be finishing up with k-means this will be a quick video we're just going to do a really simple implementation of k-means I'm kinda gonna leave you on a bit of a cliffhanger here in terms of I'm not going to explain everything I do in here I'm gonna give you kind of some further reading for those of you that are interested in looking at that and then as we move into more unsupervised learning algorithms so as I get into neural networks which will be another series coming soon excuse me that I'll be talking more about specifically what some of the accuracy measures mean for this data set because unlike other data sets actually more difficult to kind of test this for accuracy and validity so you'll see that in just a second so essentially these are my imports here again all this stuff will be up on Tech with Tim net if you guys want to copy that code the code that I showed in the last video that ran all of the woody code like that image stuff it'll be up on tech with Tim done that as well that you guys can just go ahead and copy with that so go ahead link with me the description and you can copy the code from there if you don't want to type it out with me but essentially we're gonna load in our data so we've got low digits here from SK learn this is from datasets same thing as like loaned what did we load before I forget which one we load it before we use some kind of a scalar and data set I forget what it was but essentially we use that so we'll do is let's say digits equals load underscore data and now load digits is what it's called isn't it load digits and now what we're gonna do is we're gonna say data equals scale talk about this in a second and we'll say digits dot data so essentially this dot data part right is all of our features so we're gonna scale all of our features down so that they're within the value negative one and one and the reason we do that is because our digits by default are gonna have large values I believe it's an RGB value or might be like a grayscale value I honestly don't know but they're gonna be large so by scaling them down we're gonna save time on the computations especially cuz we're doing including distance between points so having smaller values will just be better and it'll lead kind of to less outliers and all that all right so it'll make things faster essentially as we're doing the scan now we're gonna get our our labels so what we'll do is say y equals data dot targets like that target or targets I want to say it's target's but we'll see anyways so we've got our targets now and what I'm gonna do now is just set the amount of clusters that we're gonna look for the amount of centroids to make so we could do this fancy thing or we do like NP dot unique and we get of data targets of Y and we do the length of that and it would be like the dynamic way to do it if we were going to change this data set or we could just type 10 because we're gonna have 10 digits so I mean feel free I just want to show you that way in case you want to see how you get it the amount of different classes or like classifications for data set dynamically you can use what I just did so now what we're gonna do is we're gonna get the amount of instances like the amount of numbers that we have that we're gonna classify and what do you call the amount of features that go along with that data so to do this I'm just gonna say samples features think that's how you spell features is equal to we'll say data dot shape the reason this works is because we have shape it kind of looks something like this like it'll look like let's say we have like a thousand instances and like it's by 728 then it'll just decompose this for us and the side we can do that in Python pretty straightforward you guys probably already know what and now what we're gonna do is we're actually I'm gonna bring in a function that we used in the last video you saw when I did all that kind of the images were popping up and all that and we saw like the centroids and it was like a nice graph on that plot lib well they have a really nice way of scoring this and you know I could come up with my own way to score this and do the accuracy but why don't we just take it straight from SK learn as that's the module reusing so I'm just gonna copy this in and we'll talk about kind of what this is doing it's it's a big function just be aware okay so essentially this actually reminds me now I have to import metrics from SK learn so say from SK learn import metrics and SK learn has a bunch of functions in there that will automatically score are like supervised learning or unsupervised learning algorithms now we can see that we ton of different ones here it's a completeness score V measure score adjusted ran score mutual info silhouette score and honesty I don't know what all of them do there's a lot of them here all I know is kind of the range to what you're looking for for some of these different scores because they have like a crazy math background behind how they score the model and get like the best accuracy and all of them represent kind of a different thing now essentially what we do um here right is we have this bench k-means and we're gonna create a classifier down here I'm just gonna call this function on our classifier it's gonna print out all this information well this allows us to do essentially is train like a ton of different classifiers and just score them by calling this function so essentially what we do is we give it the classifier it's gonna fit our data which is another argument to that classifier and then it's just gonna use a bunch of different things to score it so essentially forward like homogeneity score I think that's how you say it we're gonna take our Y values which are up here right so all our targets we're gonna compare them to the labels that are estimated gave for each of our data now remember because this is an unsupervised learning algorithm and we don't give it the Y values when we train it automatically generates a Y value for every single test data point that we give it so we don't actually have to split into test and training data because well it never it doesn't know to start what our test data is so we can actually just compare the test data labels to what our estimator or our classifier estimated right like what it predicted each label was and that allows us to kind of train on maybe less data per se because we don't need to have like that training data testing data I do like that split whatever split test train thing that we used in all the other videos so that's good to know for this metric here all this is doing essentially is we're just saying that when we do like the silhouette score we're gonna use Euclidean distance and that's just like the absolute distance between two points or two vectors in a space there's some other distances that we could mess around with but we're just gonna use Euclidian for now so to make our classifier we're gonna say classifier equals k-means now this takes a few different per is this incorrect is it a capital M yes okay so for this classifier we need to define first of all them as centroids we need to give it the amount of times we're gonna actually haven't run the classifier we can give it a max one iterations there's there's a ton of different parameters and I'll actually I'll show you here so if I go to this one you can read through like all of the different parameters and they kind of go like this okay so the first one we're gonna do is n underscore clusters and this is just gonna be set equal to however many things we're trying to classify right so for the clusters we'll say under square clusters and this is gonna be the same as like the amount of centroids essentially we'll say is equal to K and that's what we've defined up here as ten okay what else do we need let's go back here and I'm just gonna read these off because obviously don't remember all of them an it okay so what this will do is remember how I was saying we can have our centroids like those little X's in random positions when we generate them well that is correct but we can also have them in somewhat of them more somewhat of in a way that makes a bit more sense and I don't know exactly how it does it mathematically but I'm pretty sure it just lays them out so they're like equal distance from each other on the grid or on in the space and we can do that if we just set k-means plus plus so you can play around with either choosing random or k-means plus plus and see if you're getting a better classifier it shouldn't make a massive difference but k-means plus plus essentially is just gonna change the location of your initial centroids so that maybe it runs a bit faster you don't have to iterate as many times so for a net I'm actually just gonna use random for right now it doesn't really matter that much and an internet I think it's a net equals a random okay so let's go back here so what else do we need n in it okay so this is the amount of times we're gonna run the algorithm so what actually is gonna happen is cuz we're randomly or because sorry we are randomly kind of placing these centroids we might sometimes get a better result by running the algorithm with a different random location for the centroid so it's essentially what this is saying all right saying the number of times k-means algorithm will run with different centroid seeds that's like how many times are gonna randomly generate the centroids for the first iteration so we can run this ten times and then essentially it's going to take the best one and that's gonna be our classifier I hope that makes sense to you so for n in it we can set it equal to ten just so we kind of have this here that's the default value you can increase it obviously if you increase it it's gonna take longer if you decrease it it's gonna be shorter max iterations so I recommend you just leave this as 300 but essentially remember I was saying we're gonna continually keep repeating the process until eventually nothing changes well to do that could take a very long time especially if we have a ton of data so this is automatically gonna cap us at 300 iterations now if you have if you want the best possible classifier and you want to make sure that it doesn't doesn't matter our time how much time it takes you can set this to like actually I don't know if there's like an infinite number but I think you just said it's like a very high value and hopefully it never even gets to that because it'll just have like a perfect classifier by that point does that make sense okay so these ones now we're kind of going into some more super like hyper parameters that I'm not really gonna talk about verbose like if you guys want to read through this I'll I have all the links on my website and in the description so you can see but essentially that's all we kind of need for our classifier so now we will pass this actually to bench k-means so we'll say bench k-means we'll give it our classifier which can be CLF we'll give it the name which I'm just gonna say is one in this case and we'll give it our data which is just called data and now if I don't know if I have this as the right configuration right now let's edit this k-means tutorial I believe it's working file yes it is so it'll apply that and let's just run this and see what we're getting MP dot an array object has no attribute targets pretty sure its target let's try this y equals target data this I think it might be digits data did just saw targets let's try this one no targets let's try target sorry guys no actually target okay give me one second yeah so I probably help if I spell target correctly Wow all right so target and in it go an unexpected keyword and underscore cluster so I believe that should be clusters and there we go okay so awesome so now it's printing out all of our accuracy scores for us okay so we have six nine four one seven which actually I believe is just giving us ya the inertia okay and then we have all of these different scores which are will represent like homogeneity completeness V measure adjusted R and all that now I'm not gonna talk about what these mean essentially the higher the better for most of them not all of them but if you want to read and I recommend you do I'll leave this link in the description and it essentially goes through what all of the different scores mean it'll give you like a mathematical like derive the mathematic mathematical equations for you and you can kind of look at that and it's pretty interesting so I'm not gonna talk about that we will do that in the neural networks we'll talk about what all these mean but for right now I'll leave the link if you guys want to read that and that's gonna be it for this machine running tutorial if you guys enjoyed these tutorials please make sure you let me know in the comments join my Twitter or join my discord and neural networks will be coming soon in the meantime I'm probably thinking about doing some kind of discord bought tutorial maybe we'll do some Kibby app development let me know what you guys want to see down below and with that being said I'll see you again in another video [Music]
Original Description
This python machine learning tutorial covers k means clustering. How to implement K means clustering in python using sklearn.
⭐ Kite is a free AI-powered coding assistant for Python that will help you code smarter and faster. Integrates with Atom, PyCharm, VS Code, Sublime, Vim, and Spyder. I've been using Kite for 6 months and I love it! https://kite.com/download/?utm_medium=referral&utm_source=youtube&utm_campaign=techwithtim&utm_content=description-only
Text-Based Tutorial: https://techwithtim.net/tutorials/machine-learning-python/k-means-2/
SkLearn Performance Evaluation: https://scikit-learn.org/stable/modules/clustering.html#clustering-evaluation
SkLearn Example Code: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
**************************************************************
WEBSITE: https://techwithtim.net
proXPN VPN: https://secure.proxpn.com/?a_aid=5c34b30d44d9d
Use the Code "SAVE6144" For 50% Off!
One-Time Donations: https://goo.gl/pbCE9J
Support the Channel: https://www.patreon.com/techwithtim
Twitter: https://twitter.com/TechWithTimm
Join my discord server: https://discord.gg/pr2k55t
**************************************************************
Please leave a LIKE and SUBSCRIBE for more content!
Tags:
- Tech With Tim
- Python Tutorials
- Python machine learning tutorial
- Machine learning tutorial python
- How does k-means work
- K means clustering python
- K means clustering tutorial
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Tech With Tim · Tech With Tim · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
A* Path Finding Algorithm(Visualization)
Tech With Tim
Python Programming Tutorial #1 - Variables and Data Types
Tech With Tim
Python Programming Tutorial #2 - Basic Operators and Input
Tech With Tim
Python Programming Tutorial #3 - Conditions
Tech With Tim
Python Programming Tutorial #4 - IF/ELIF/ELSE
Tech With Tim
Python Programming Tutorial #5 - Chained Conditionals and Nested Statements
Tech With Tim
Python Programming Tutorial #6 - For Loops
Tech With Tim
Python Programming Tutorial #7 - While Loops
Tech With Tim
Python Programming Tutorial #8 - Lists and Tuples
Tech With Tim
Python Programming Tutorial #9 - Iteration by Item (For Loops Continued...)
Tech With Tim
Python Programming Tutorial #10 - String Methods
Tech With Tim
How to Overclock a NVIDIA GPU
Tech With Tim
Python Programming Tutorial #11 - Slice Operator
Tech With Tim
Python Programming Tutorial #12 - Functions
Tech With Tim
Python Programming Tutorial #13 - How to Read a Text File
Tech With Tim
Python Programming Tutorial #14 - Writing to a Text File
Tech With Tim
Python Programming Tutorial #15 - Using .count() and .find()
Tech With Tim
Python Programming Tutorial #16 - Introduction to Modular Programming
Tech With Tim
Python Programming Tutorial #17 - Optional Parameters
Tech With Tim
Python Programming Tutorial #18 - Try and Except (Python Error Handling)
Tech With Tim
Python Programming Tutorial #19 - Global vs Local Variables
Tech With Tim
Python Programming Tutorial #20 - Classes and Objects
Tech With Tim
Cool VBS Script to Prank Your Friends!
Tech With Tim
How to Overclock an AMD GPU
Tech With Tim
Best GPU'S For Mining Ethereum (2018)
Tech With Tim
Recursion and Memoization Tutorial Python
Tech With Tim
Ethereum Mining Rig - Hardware Guide
Tech With Tim
Pygame Tutorial #1 - Basic Movement and Key Presses
Tech With Tim
How to Install Pygame (Windows 8/10)
Tech With Tim
How to Trade Your Cryptocurrency (Bitcoin, Ethereum etc.) For Cash!
Tech With Tim
How to Mine Ethereum 2018 - WORKING (Super-Easy)
Tech With Tim
Microphone Comparison - $10 Mic vs $150 Mic (Blue Yeti USB)
Tech With Tim
Pygame Tutorial #2 - Jumping and Boundaries
Tech With Tim
Pygame Tutorial #3 - Character Animation & Sprites
Tech With Tim
Pygame Tutorial #4 - Optimization & OOP
Tech With Tim
OBS Studio Tutorial - Best OBS Settings
Tech With Tim
Linear Search Algorithm - Python Example and Code
Tech With Tim
Make Any Mic Sound AMAZING! (WITH OBS)
Tech With Tim
Binary Search Algorithm - Python Example & Code
Tech With Tim
Pygame Tutorial #5 - Projectiles
Tech With Tim
Pygame Game - Mini Golf
Tech With Tim
Pygame Tutorial - Projectile Motion (Part 1)
Tech With Tim
Pygame Tutorial - Projectile Motion (Part 2)
Tech With Tim
Pygame Tutorial #6 - Enemies
Tech With Tim
Pygame Tutorial #7 - Collision and Hit Boxes
Tech With Tim
Pygame Tutorial #8 - Scoring and Health Bars
Tech With Tim
Cloud Mining vs. Hardware Mining - 2018
Tech With Tim
How to Install Pygame on Mac OSX (Fast-Simple)
Tech With Tim
Pygame Tutorial #9 - Sound Effects, Music & More Collision
Tech With Tim
Pygame Tutorial #10 - Finishing Touches & Next Steps
Tech With Tim
How to Fade Your Screen in Pygame [CODE IN DESCRIPTION]
Tech With Tim
How to Create a Button in Pygame [CODE IN DESCRIPTION]
Tech With Tim
Pygame Side-Scroller Tutorial #1 - Scrolling Background/Character Movement
Tech With Tim
Pygame Side-Scroller Tutorial #2 - Random Object Generation
Tech With Tim
Pygame Side-Scroller Tutorial #3 - Collision
Tech With Tim
Pygame Side-Scroller Tutorial #4 - Scoring and End Screen
Tech With Tim
How to Create A Message Box in Python - Tkinter
Tech With Tim
Is Ethereum Mining Still Profitable - Is It Worth It (April 2018)
Tech With Tim
How to Run MAC OSX on a WINDOWS PC (Clover Boot-loader)
Tech With Tim
Programming Problem #1 - Alphabet Soup (Beginner/Novice)
Tech With Tim
Related Reads
📰
📰
📰
📰
Document Object Model [DOM] CRUD Operations
Dev.to · Madhan Raj
I Found a Surprisingly Fun Way to Practice Frontend Development
Dev.to AI
The Enter key that submits your form while a Japanese user is still typing
Dev.to · greymoth
The two-Reacts bug: when packages aren't singletons
Dev.to · r9v
🎓
Tutor Explanation
DeepCamp AI