In this article, we will explore the basics of four data science approaches:
Supervised Learning
Unsupervised Learning
Semisupervised Learning
Reinforcement Learning
We can classify Machine Learning Systems according to the amount and type of supervision they get during training.
Basically, there are four major approaches as said above, One by one we will put light on each one of them and try to understand their basics.
Supervised Learning: In Supervised Learning, the training data we feed to the algorithm include the labels. Here labels mean the desired output we are looking for.
For example: If we want to predict the price of the house by providing various input features, then the past data which we feed into the machine learning algorithm will have a price variable also. so that our model will learn to predict the price of the house based on past data. For next time when we provide input features, the model will reflect output that is the price of the house.
Basically, with the help of supervised learning, We perform two tasks:
Classification: The spam filter is a good example.
Regression: The prediction of the car price, given set of features (mileage, age, brand, etc).
Note: Some Regression algorithms can be used for classification as well, and vice versa. For example, Logistic Regression is commonly used for classification, as it can output a value that corresponds to the probability of belonging to a given class. (Eg 80% chance of being ham).
Here are some of the most important Supervised Learning Algorithms:
Linear Regression
Logistic Regression
Linear Discriminant Analysis (Used when output is not binomial, Multiple classifications)
Decision Tree
Random Forest
Support Vectors Machines (SVMs)
Ensemble Techniques (Bagging & Boosting)
Neural Networks
Unsupervised Learning: In unsupervised learning, as you might guess, the training data is unlabeled. Here we can say that the machine learning system tries to learn without any supervision or a teacher.
In this type of learning, the system itself find patterns or relationship between training data points and try to cluster (group) them according to their similarity between each other.
And at the last provide us information about the training data, that how the system has made cluster and on what basis. Which further helps in business.
For example: Suppose I have a lot of data about my blog's visitors. And I want to run a clustering algorithm to try to detect the group of similar visitors. When I run this algorithm on training data, I might get that 50% of my visitors are females who love machine learning topics and generally read my blog during the weekends. While 30% are young college students who read my blog in the evening.
At starting of this analysis, We do not tell the algorithm that, which group a visitor belongs to, It finds those connections without our intervention or help.
These are some of the most important Unsupervised Learning Algorithms:
Clustering:
K-Means
DBSCAN
Hierarchical Cluster Analysis (HCA)
2. Anomaly Detection and Novelty Detection:
One- Class SVM
Isolation Forest
3. Visualization and Dimensionality Reduction:
Principal Component Analysis (PCA)
Kernel PCA
Locally-Linear Embedding (LLE)
t - distributed Stochastic Neighbor Embedding (t-SNE)
4. Association Rule Learning:
Apriori
Eclat
Some Important Unsupervised Task:
Visualization algorithms are also good examples of unsupervised learning algorithms. Here we feed them a lot of complex and unbalanced data, and they output a 2D or 3D representation of our data that can easily be plotted.
Dimensionality reduction also comes under unsupervised learning, Where our goal is to simplify the data without losing too much information. One way to do this is to merge several correlated features into one, This is called feature extraction.
Another important unsupervised task is anomaly detection. Here the system is shown mostly normal instances during training, so it learns to recognize them and when it sees a new instance it can tell whether it looks like a normal one or whether it is likely an anomaly. Here is a little task for you, try to find out: What is the difference between anomaly and novelty detection?
Association rule learning is also a common unsupervised task in which the goal is to dig into large amounts of data and discover interesting relations between attributes.
Semisupervised Learning: We have some algorithms in machine learning, which can deal with partially labeled training data, usually a lot of unlabeled data and a little bit of labeled data.
For example: Google Photos are a good example of this. When we upload all our family photos to Google photos, it automatically recognize that the same person A shows up in photos 1, 5, and 11, while another person B shows up in photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all the system needs is for you to tell it who these people are. Just one label per person, and it is able to name everyone in every photo, which is useful for searching photos. This is the supervised part of the algorithm.
Reinforcement Learning: It is a very different beast. The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.
That is all for this article, here we discussed four major approaches in Data Science and try to understand the basics along with some good real-time examples. Hope this blog is helpful to you.
To write this article, I have taken reference from the book: Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow. To get this complete book in pdf form click here.
In my upcoming blog, I will come up with more topics on Data Science and try to make you people understand those in a very easy and descriptive way.
Was this blog helpful ? Let us know in the comment below.
Till then thanks for giving your time and reading my creation.
Comments