#concept

softmax turns a vector of real values (model predictions) into ? probabilities that sum to 1.

It is used as an activation function

use of softmax in a neural network >> It is often used as the last activation function of a neural network to normalize output of a network to a probability distribution over predicted output classes.

standard (unit) softmax equation where

  • sigma is the output. a vector of real numbers that is normalized between 0 and 1
  • z is input vector
  • K is vector length ?

re:224w, intuitively…softmax is great because you are maximizing the difference in the dot product between vectors. and normalize across all dot products

References

Notes