In this post we will understand how neural networks work in detail. This post will be theoretical but i believe its important to understand NN’s before we actually start coding.
To understand NN’s and ML in detail, we do require deep understanding of statistics and advance math. But since my posts are more oriented towards developer’s and implementation, i will try to avoid it as much as possible and leave external links for more deeper understanding.
Basics (Layers & Nodes)
Neural Networks are composed of layers and each layers in composed on nodes. So even the most basic neutral network has an input layer , hidden layers and output layer.
Hidden layers can be again multiple layers with n number of nodes.
Next, every node in a layer is connected to every other node. i.e data flows from one layer to another (every node in one layer to every node in another layer).
Any layer between input and output layer is called hidden layer. We can have any no of nodes in layers.
Every node is connected to another node via weights (w1, w2, w3) etc usually weight is a value between 0 – 1. If weight is 0, no data is passed if weight is one full data is passed.
We can think of weights as the signal strength between two nodes, when a data is passed between two nodes it’s multipled by the weight.
Next step in an NN, is that a weighted sum of the input for a node is taken and is passed through an activation function. An activation function defines if that node should be activated or not based on the output between 0 – 1 and the value is passed on to the next layer.
When we have output from the activation function for a single node, it is passed on to the next layer and same is repeated for every node in every layout until we reach the output layer.
One full forward pass is when a single set of data has traveled from input to output layer.
Generally our output layer has nodes based the problem we are trying to solve. Suppose we are trying to classify images into types shirts, pants, shoes, then out output layer will have two nodes. If we trying to predict out come e.g if a news article is positive or negative it will have two nodes in output layer.
Training the network
The entire purpose of the NN is to optimize weights of node’s such that we are able to get the correct output or reduce the cost function
Cost Function (Loss)
Cost function is defined (in layman terms) as the difference (or distance) between the output and the expected output. The entire purpose of the NN is to change weights of every node so as to minimize the cost function.
For the NN to learn, i.e to minimize the cost function. There is something called as Back propagation which happens. The theory behind this is very complex mathematics, but in short we try to find which node weight was responsibility for contribution the max error in the loss function so that in the next pass we can optimize the weights better. To do the, the output propagates back in the network and the weights get updated.
- We have our input data passed to input layer.
- Input layer will pass data hidden layers.
- Random weights will be assigned to to the nodes, based on the data a weighted sum will be calculated for that node and layer.
- This weight sum will be passed through an activation function and then to the next layer, until we reach the output layer.
- Once we have the output, error will be calculated based on the loss function.
- Then error will be propagated backwards to optimize weights further.
- This process will continue for our entire data set and in the we will have the neural network accuracy and loss function percentages with us.
This is a very brief and simple explanation of NN’s or ANN’s (artificial neural networks). In subsequent blogs we will see this in much more detail.