27 most important keyboard shortcuts on Mac

New to mac ?? Are you confused with the ‘Command’ and ‘CTRL’ on Mac OS and wondering about the keyboard shortcuts on mac ? Then keep reading. Modifier keys are special keys on any keyboard which…

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

Supervised Learning

Supervised Learning is a category of machine learning problems in which the system is trained to map the input data to the output data based on the mapping of input and output pairs in the training data.

What does that means? Let’s say we are building an app that could classify the dog and cat images. We will first provide the system with a lot of different images and will give them some labels(in this case cat and dog). Now the system will learn to map those images to the right labels. After that when someone would give it an image it could successfully classify if the animal in the image is a cat or a dog. This type of a problem comes under supervised learning. Supervised Learning finds the patterns in the training data and uses those patterns for test data to predict the correct output.

Supervised learning problems can be of two types: 1. Regression 2. Classification. In this article we will talk about Regression only.

Regression Problems have discrete input variables which are mapped to discrete output values. For example in a problem like predicting the price of an used car. This could include multiple input parameters such as total distance travelled, no. of times car was sent for service, how old the car is etc. These parameters are usually called as features and in this case the price of the car will be called as label.

One of the most common type of algorithm used to model regression problems is Linear Regression.

Linear Regression is a very trivial kind of algorithm. Let’s use a sample data-set which we can model using the linear regression.

Now the data given in the picture is pretty simple. On the one side we have a feature called x which is mapped to a label called y. In this case, every discrete value of x will give a discrete value for y. We can plot this data in the graph as given on the left side of the picture(for now don’t bother about that line and the equation, it will soon become very clear).

Since we have our data plotted now to find the pattern in the data, linear regression will try to find a line that best generalises the data. As you can see in the data there is already a line that perfectly generalises the data. And the equation in the bottom is used to represent the line.

We can clearly see from the equation that m and c are the variables that governs the line and if we could somehow find a correct value for these two variables we can find the line that is the best fit for our data.

Now the question comes how can we find the value for these two variables? Think about it this way, how is a best fitted line different from other lines? Try thinking about it.

The answer is the average distance from each point to the line is minimum for a best fitted line.

So to find our ideal line, all we have to do is to find the values of m and c such that the average distance between each data point and the line is minimum.

Now to find the minimum distance we first need to find the distance of each point from the line.

We have error value for each data point in the training data set. Now we have to average the error value for every data point in the training data.

Finally we have our average error value. Now, our task is to minimise the loss function.

To minimise the loss function we use a very popular algorithm called Gradient Descent. Job of gradient descent is to find the minimum value for the cost function against all the possible values of m and c.

We can use the plot above to better understand the working of gradient descent. The coloured plane in the plot gives the value for the cost function against the value of m and c(in the plot m and c are represented by theta1 and theta2). The blue region represents the region where the value of cost function is lowest. We first give some initial value for m and c ( usually zero ) and then we iteratively decrease the value of J by changing the value of m and c until we reach the point where it’s value is minimum.

At this point someone could ask a question. How are we going to decrease the value of loss function? Answer is simple, we will decrease the values of m and c by some factor after each iteration.

Now, let’s try to understand what this factor is. The factor we subtracted contains two variables- alpha and partial derivative of loss function. While moving towards the least value of the loss function we must know that how big our step should be and in which direction it must be(in our case it’s downwards). This factor alpha, which is called as learning rate, tells us that how big step should we take (Note: Alpha should not be too big otherwise we will overpass the desired position) and partial derivative of loss function with respect to one of the variables, tells the direction.

I appreciate your efforts if you are still reading this article. There are a lot of different topics that I haven’t talked about in this article such as Overfitting and underfitting, when does gradient descent works well and when it doesn’t, alternatives of gradient descent such as Newton’s Method etc. There are a lot of interesting things to talk about in Machine Learning and this article just scratches the surface of it. If you like it please do give me some claps. I would love to hear your thoughts on this article in the comments or in my personal inbox at(bw99214@gmail.com).

27 most important keyboard shortcuts on Mac

Supervised Learning

Add a comment

Related posts:

Once in a Blue Moon

Best Custom ROMs For Android

Make Money Building a Catio