The output layer of the [[transformer decoder]] produces the probabilities for the next token in the sequence. The [[softmax function]] is applied to these probabilities to generate a probability distribution over the vocabulary of possible output tokens. [[encoder-decoder attention]] < [[Hands-on LLMs]]/[[2 LLMs and Transformers]] > [[learnius/llms/2 LLMs and Transformers/prompt|prompt]]