Chapter 3-Machine Learning Types and their applications

Python Machine Learning and Data Science — 101 Series

Souravi Sinha
5 min readMay 17, 2020

Isn’t it all getting serious now?

But before we get into the types of Machine learning, we should know the type of data they ingest.There are basically 2 types of data -

Labeled data

Unlabelled data

Labeled data is the type which have been tagged with one or more labels. Labelling typically takes a set of unlabelled data and augments each piece of that unlabelled data with meaningful tags that are informative.

For example, labels might indicate whether a photo contains a horse or a cow, which words were uttered in an audio recording, what type of action is being performed in a video, what the topic of a news article is, what the overall sentiment of a tweet is, whether the dot in an x-ray is a tumor, etc.

Labelled vs Unlabelled data

Another Example -

Let’s say you post a group pic on FB or Instagram, if you tag all or most of your friends present in the pic, then that’s labeled data. If you don’t tag, that’s unlabelled data.

Here comes the types of ML —

Supervised Learning

labelled data fed for supervised learning

So, Supervised learning is like learning in the presence of some teacher.

If we have a function as-

Y = f(x)

where x is our well known data and Y is the outcome. Then according to Maths( which I both loved and feared as a child ;), finding f( ) is no big deal.That’s what is supervised learning.

So, the algorithm finds relationships between the input and the output. Hence, determining f() from x and y. The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data.

This type of machine learning algorithm is trained on labeled data. Here, the ML algorithm is given a small training dataset to work with. It gives the algorithm a basic idea of the problem, solution, and data points to be deal with.

This training dataset is a subset of the bigger dataset. The training dataset is also very similar to the final dataset .

Then the solution is tested on a testing dataset again similar to that of training dataset. Once successful, the solution is used with the final dataset, which it learns from in the same way as the training dataset.

Basically, the algorithm is learning under the supervision like that of a teacher where the teacher knows the end result and the algo will be corrected and reran untill the expected outcome is satisfactory.( OH MY MY!! it does sound like exams)

Supervised learning problems can be further grouped into regression and classification problems.

  • Classification: A classification problem is when the output variable is a category, such as “red” or “blue” or “disease” and “no disease”.
  • Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.

Unsupervised Learning

Unsupervised categorial data

Unsupervised learning is where you only have input data (X) and no corresponding output variables.

The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.

These are called unsupervised learning because unlike supervised learning above there is no correct answers and there is no teacher. Algorithms are left to their own devises to discover and present the interesting structure in the data.

Unsupervised machine learning works with unlabelled data. This means that human labor is not required labelling the data, thus allowing much larger datasets to be worked on by the program.

Unsupervised learning does works by creating hidden structures. Relationships between data points are perceived by the algorithm in an abstract manner, with no input required from human beings.

The creation of these hidden structures is what makes unsupervised learning algorithms versatile. Unsupervised learning algorithms can adapt to the data by dynamically changing hidden structures. This offers more post-deployment development than supervised learning algorithms.

Unsupervised learning problems can be further grouped into clustering and association problems.

  • Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.
  • Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.

Semi-Supervised Learning

Problems where you have a large amount of input data (X) and only some of the data is labeled (Y) are called semi-supervised learning problems.

These problems sit in between both supervised and unsupervised learning.Many real world machine learning problems fall into this area.

Reinforcement Learning

Reinforcement Learning

Reinforcement learning works on rewards and punishments. It features an algorithm that improves upon itself and learns from new situations using a trial-and-error method. Favorable outputs are encouraged or ‘reinforced’, and non-favorable outputs are discouraged or ‘punished’.

Every iteration of the algorithm, the output result is given to the interpreter, which decides whether the outcome is favourable or not. If it’s the correct solution, the interpreter reinforces the solution by providing a reward to the algorithm. If the outcome is not favorable, the algorithm is forced to reiterate until it finds a better result. In most cases, the reward system is directly tied to the effectiveness of the result.

In typical reinforcement learning use-cases, such as finding the shortest route between two points on a map, the solution is not an absolute value. Instead, it takes on a score of effectiveness, expressed in a percentage value. The higher this percentage value is, the more reward is given to the algorithm. Thus, the program is trained to give the best possible solution for the best possible reward.

--

--