In machine learning, a logit (short for _logistic unit_ or _logistic transform_) is an intermediate value that represents the strength of evidence in favor of a binary outcome.
Logits are the output of a linear transformation applied to the input features, and they are then transformed using the logistic function (also known as the sigmoid function) to obtain probabilities.
Mathematically, the relationship between the logit $z$ and the probability $p$ is given by the sigmoid function:
$ p = \frac{1}{1 + e^{-z}} $
Where:
- $p$ is the predicted probability of the positive class (binary outcome).
- $z$ is the logit, the result of the linear combination of input features and weights.
Logits have several advantages over directly using probabilities in models like logistic regression:
- They allow working with linear combinations of features before transforming them into probabilities.
- They can take positive and negative values, allowing the model to capture different degrees of evidence for both classes.
[[softmax function]] < [[Hands-on LLMs]]/[[2 LLMs and Transformers]] > [[attention head]]