In this post we will see what exactly is linear regression and it will help us better understand NN’s.
This post can be skipped, but i think it provides a good foundation for understanding NN’s
So what exactly is linear regression?
Linear regression is way in which we are able to predict outcome lets say (y) based on a single variable (x). This would a very basic model like
y = w * x + b
here y = is the outcome we are trying to predict
x = is the input we have
w = weight which we need to optimize
b = is the initial bias
To understand this better, let’s assume we have data set like this
This is chart in which we have value of “cricket chirps per minute” vs “temperate”. So if we assume x to be “temperature” then we need to make a liner model to predict the value of y i.e “crickets chirps per minute” accurately.
Since our model is linear it would be straight light graph like this
The goal is to find the best estimates for the coefficients to minimize the errors in predicting y from x.
Source: https://developers.google.com/machine-learning/crash-course/descending-into-ml/linear-regression
How to do this?
So to start of with we would need assume some values for our coefficients or in case of linear regression we can use some thing like
Simple regression is great, because rather than having to search for values by trial and error or calculate them analytically using more advanced linear algebra, we can estimate them directly from our data.
This link give a much more technical and detailed mathematical understanding of linear regression https://machinelearningmastery.com/simple-linear-regression-tutorial-for-machine-learning/
For for simplicity let’s assume we choose some random coefficients and based on our initial data i.e training data we calculate output value.
So let’s assume we have our training data let’s say 100 points.
So based on random values of our coefficients we pass in our input data i.e x and get an output y.
Estimating Error Or Loss
Now we look at our output and we already have the expected output from our training data. We calculate the loss using using Mean Squared Error function. If you want to know what is MSE and the reason why MSE check out this link as it give a good explanation for it https://developers.google.com/machine-learning/crash-course/descending-into-ml/training-and-loss
Next based on this loss we need to optimize our coefficients to get a better model.
Gradient Decent
So once we have calculated MSE for our data we would have a data similar to this.
This exact graph would be made when we are able to calculate loss for all possible weights. But since that is not feasible, as that would mean calculating MSE for all possible numbers we just stick to our dataset. Hence we assume that we would have a single minimum value based MSE but since we cannot generate data for all possible number we find the gradient or slope of the we have.
If the slope is negative i.e moving down this means we are going towards the minimum value of loss function. So we change the weights in small increments in the direction of the slope and calculate the loss again.
This iteration is called gradient descent that changing weights in the direction of decreasing slop to reach the minimum value.
Source: https://developers.google.com/machine-learning/crash-course/reducing-loss/gradient-descent
Learning Rate
Learning rate is the magnitude of steps which our model takes in the direction of negative slope to compute the next weight.
This link explains this very well: https://developers.google.com/machine-learning/crash-course/reducing-loss/learning-rate
Stochastic Gradient Descent
Till now we are calculating the MSE based on our full input data. But many times data is so big like millions of data sets that even calculating the initial MSE can take lot of time and computational power.
So solution is to choose a smaller data set randomly to calculate the gradient and this is called Stochastic Gradient Descent
So based on above techniques we are able to fit our model to data. Once we have a model ready and trained we can predict future values.
There is a very simple predictive model we saw as in this is just linear and it just has one variable. But understanding this helps to clear basics.
Further reading resources