As a beginner in machine learning or data science field one of the initial concept you might have to understand is on two of it’s most talked about sub branches – Supervised and Unsupervised Learning. Honestly, these two things used to confuse me initially when I was beginning out. And if you too are beginning out then most likely you also would like to have a more clear understanding between the two. So in this post we will have a in-depth comparison between supervised vs unsupervised learning.
Supervised vs Unsupervised Learning
Supervised and unsupervised learning, though both come from the family of machine learning, but they actually have very different characteristics. And this difference lies reflects in their name. Yes it is their way of learning data – ‘Supervised’ vs ‘Unsupervised’.
Let us understand this in more details below.
In this type of learning, as the name suggests, there is indeed a supervision which exist while creating the machine learning model.
- There is a training phase in which the machine learning model is fed the data.
- Training data has distinct input data and the corresponding output label.
- The model is trained on this training data and is made to learn the complex relationship between input and its output.
- Learning takes place over many iterations known as epochs. In each epoch the model learns a bit and there is a feedback given to model so that it can learn better in next epoch.
As we can see in above steps, a supervision is given to the model in form of training and feedback for improvements. And thus the name Supervised Learning !!
As the name might have indicated to you, there is no supervision in this type of learning. The learning steps are quite different here.
- There is no training phase and neither does the data has any input or output distinction.
- The model is just fed all the data and is left on its own to learn some pattern and relationship between the data. That is it !!
This way of learning looks blunt and bad isn’t it ? No hand holding, no training, no supervision to our model for it’s learning!! Well this is why it is known as unsupervised learning.
A Helpful Analogy
When I was beginning out, I used to visualize the two learning approaches as below –
- Supervised Learning – Dad teaches child how to swim in pool. The child learns to swim gradually after many sessions based on regular feedback from his dad.
- Unsupervised Learning – There is no dad to supervise here. The child simply jumps into the pool and has to learn swimming on its own.
The child is the machine learning model and swimming pool is the data. And who is dad here ? My bet is, here Dad has to be cost function which is responsible for giving error feed back in supervised learning 😀
I hope this analogy will help you also to cling to this concept.
Supervised vs Unsupervised Learning – Difference in data
If you would have noticed I mentioned that in unsupervised learning, the data has no distinct input and output, which is unlike supervised learning.
Output label may be absent from data in following scenarios –
- The characteristics of data is such that the concept of output label does not arises.
- The data is fairly unexplored and even though there might exist certain output label but they are yet to be discovered or derived.This is to say the data is raw and needs some exploration or processing.
In absence of any output label, model has no input-output mapping to learn and hence there is no training phase in unsupervised learning !!
Sub-Categories of Supervised and Unsupervised Learning
It has two main subcategories as follows –
Regression – In such problems the model predicts continuous number as output based on given set of input data.
Classification – Here the model predicts the output class or category based on given input data
It has three subcategories as follows –
Clustering – Here the model divides the data into multiple groups or clusters based on similarity between data. Data having more similarities are placed in same clusters.
Association Rule Mining– In these problems the model learns the association between data and comes up with certain rules
Dimension Reduction – This method is used in data that has very high dimension. Using these techniques the high dimension of data can be reduced without loosing the underlying relationship within the data. This is useful to avoid curse of dimensionalality.
Supervised and Unsupervised Learning Use Cases
As you might have seen unsupervised and supervised learning are totally two different approaches of machine learning and each has it’s own use cases.
Some of the common and popular applications of supervised learning are –
- Risk Analysis – Helps business to forecast risk well in advance.
- Business Prediction – Helps business to predict certain events like sales, customer churn rate etc well in advance and be prepared for it.
- Medical Diagnosis – It can predict or diagnose diseases in patients with more accuracy.
- Loan approvals – It helps banks to decided if a loan application should be approved or rejected.
- Stock Market Prediction – It can predict stock market trends in advance.
- Spam mail classification – It can classify if incoming mails are spam or not.
- Chat Bots – Chat bots have become a craze now. AI driven chat bots are trained using supervised learning only.
I can go on and on over here.. the list of supervised learning use cases are very huge considering we are living in era of Big Data now.
Let us now see some popular use cases of unsupervised learning –
- Customer Segmentation – It helps business to categorize customers into multiple groups based on their common behavior so that they can be targeted with offers and promotion much better.
- Anomaly detection – It helps to discover some rare occurrences within the data which does not comply with rest of the data.
- Fraud detection – This is an extension of anomaly detection which can be best explained by fraud or malicious intents. This is very helpful in banking where fraud transactions has to be flagged at right time.
- Market Basket Analysis – This is used to analyse the purchasing behavior of customer like what all items are purchased more frequently together.
A quick summary
Let us have a quick summary of what we have learnt in this post.
- In supervised learning the data has output label whereas in unsupervised learning there is no output label.
- Supervised learning has a training phase where it learns the mapping between input and output data.
- In unsupervised learning there, since there is no output label so there is no concept of training to learn any mapping function.
- Supervise learning has two broad categories – 1) Regression and 2) Classification
- Unsupervised learning has three sub fields – 1) Clustering 2) Association Rule Mining 3) Dimension Reduction
- Supervise learning’s popular use cases are Prediction Analysis, Spam classification, Medical Diagnosis, Stock Market Prediction etc.
- Unsupervised learning’s popular use cases are Anomaly Detection, Fraud Detection, Market Basket Analysis, Customer Segmentation.
In The End…
I hope you wont have any confusion now between supervised and unsupervised learning anymore. By any chance if you have confusion between regression and classification do check out this post –
Do share your feed back about this post in the comments section below. If you found this post informative, then please do share this and subscribe to us by clicking on bell icon for quick notifications of new upcoming posts. And yes, don’t forget to join our new community MLK Hub and Make AI Simple together.