# Optimization in Machine Learning – Gentle Introduction for Beginner

## Introduction

If you don’t come from academics background and are just a self learner, chances are that you would not have come across optimization in machine learning. Even though it is backbone of algorithms like linear regression, logistic regression, neural networks yet optimization in machine learning is not much talked about in non academic space.

In this post we will understand what optimization really is from machine learning context in a very simple and intuitive manner.

## What does optimization mean – A real life example

Optimization means making changes and adjustments to reach your goal. For example let us assume you enter a college and are in first semester. As it is your new college life you not only wish to score a good percentage in exams but also enjoy spending time playing sports and on social media.

1st Semester

Say, you wish to score 90% in your first semester exams, but you end up spending more time on playing and social media and less on studies. As a result you score way less than 90% in your exams.

2nd Semester

With this bad experience, you sit down and plan to give more time on studies and less on other activities in the 2nd semester. With this new time division you actually end up scoring much better than 1st semester but still not near to your goal of 90%.

3rd Semester

You again sit down and plan a much better time division for your studies and other activities for your 3rd semester. This time with more improved time management you end up scoring almost 90% which was your goal.

My friend, what you are doing here is optimization. Every semester you are calculating how much short you were from your exam goal and then you are optimizing your time for studies, sports play and social media in a way that you reach your goal of 90% in next exams. Below animation will explain you this optimization process.

## Optimization in Machine Learning

The optimization used in supervised machine learning is not much different than the real life example we saw above.

1. Here we have a model that initially set certain random values for it’s parameter (more popularly known as weights). These parameter helps to build a function. For e.g. $$y={ w }_{ 0 }{ x }_{ 0 }+{ w }_{ 1 }{ x }_{ 1 }+{ w }_{ 2 }{ x }_{ 2 }$$, where $${ x }_{ 0 },{ x }_{ 1 },{ x }_{ 2 }$$ are features (think study, play, social media in above example) and $${ w }_{ 0 },{ w }_{ 1 },{ w }_{ 2 }$$ are weights (think each of them as time given to study, play, social media in above example). y is the output or prediction (think as exam score in above example)
2. This function is used to make prediction on training data set.
3. The prediction is then compared with the actual results of training set. Both predicted output and actual output is send to an error function. This error function calculates the offset or error between the predicted and actual output.
4. This error is sent to an optimizer. The optimizer calculates that how much the initial values of weights should be changed so that the error is reduced further and we move towards expected output.
5. The weights of the model are adjusted accordingly for next iteration. And again predictions are made on training set, the error is calculated and optimizer again recommends for weight adjustment.
6. These iteration should keeps on going till there are not much changes in the error or we have reached desired goal in terms of prediction accuracy. At this point the iteration should be stopped.

## Some points to consider

1. Error functions are also known as loss function or cost functions. There are many types of cost functions which are used for different use cases.
2. The optimizer that is used in supervised learning is usually Gradient Descent. There are many variety of Gradient Descent.
3. The iteration is also known as epoch. The number of iterations required to minimize the error may vary from few iterations to hundreds or thousand iterations depending on the training data and use case.
4. The model thus obtained is a trained model. The steps explained above are essentially training steps of supervised learning. This trained model can be used to make prediction on unseen test data to verify the accuracy of the model.

## In The End …

So this was an intuitive explanation on what is optimization in machine learning and how it works. I hope this was a good read for you as usual.