Softmax is a special kind of activation function which use mostly used as the last layer. The final output of an NN needs to be human understandable so in the output we would need to be able to get probabilities in case of multi classifier problem.
Multi classifier problem, means a NN in which we are trying to predict if a given input is lets say “cat”, “dog”, “rat” etc or something similar that we are trying to classify the data. In such a case for a given input we would need a probability distribution in the output to be able to understand this.
Means suppose our neural networks gives an output
[10, 31,15]
this won’t make much sense to us but if we get an output
[.9833, .001, .002312] // ignore the actual values
this make more sense as this means there is a 98% change this is a cat.
This is what a softmax function does!
We can also use Sigmoid as our last activation function in case is a binary classification problem.
To implement this in our neural network
model.add(Dense(2))
model.add(Activation("softmax"))
model.summary()
Ok, again very simple!
https://colab.research.google.com/drive/1IOjSB1PF85t9qEhCWgDbWxuCRSDz71H6
Read more here
https://stats.stackexchange.com/questions/233658/softmax-vs-sigmoid-function-in-logistic-classifier
https://medium.com/aidevnepal/for-sigmoid-funcion-f7a5da78fec2