The softmax function is a mathematical function often used in machine learning and artificial neural networks, particularly in classification tasks. It takes a vector of real numbers as input and transforms them into a probability distribution over multiple classes. The output of the softmax function assigns a probability score to each class, indicating the likelihood that the input belongs to that class. The probabilities generated by the softmax function sum up to 1.
The softmax function is commonly used in the final layer of a neural network for multi-class classification problems, where the goal is to assign an input to one of several possible classes. It converts the raw scores (also known as [[logit]]) produced by the network into normalized probabilities.
Mathematically, the softmax function is defined as follows for a vector of logits $z_i$:
$
\text{Softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}
$
Where:
- $e$ is the base of the natural logarithm (Euler's number).
- $z_i$ is the raw score (logit) for class \( i \).
- $K$ is the total number of classes.
The softmax function exponentiates the input logits, making them positive, and then normalizes them by dividing each exponentiated value by the sum of all exponentiated values across all classes. This normalization ensures that the resulting probabilities are between 0 and 1 and sum up to 1, making them suitable for representing class probabilities.
In the context of a neural network, if you have an output layer with multiple neurons (one for each class), applying the softmax function to the output activations will yield a probability distribution over the classes. The class with the highest probability is often chosen as the predicted class label.
[[dot product]] < [[Hands-on LLMs]]/[[2 LLMs and Transformers]] > [[logit]]