Positional encoding is a technique used in [[transformer model]]s to provide information about the position or order of words in a sequence. Since transformers do not have an inherent understanding of sequential order, positional encoding is added to the [[input embedding]]s of the words to help the model differentiate between words based on their position in the sequence.
In [[natural language processing]], word order is crucial for understanding the meaning of a sentence. Traditional recurrent neural networks (RNNs) and convolutional neural networks (CNNs) inherently capture sequential information due to their nature, but transformers, which use [[self-attention]] mechanisms, lack this built-in sequence understanding. Positional encoding is a workaround to address this limitation.
Positional encoding involves adding a set of fixed sinusoidal functions with different frequencies and phases to the [[input embedding|word embedding]]s. These sinusoidal functions are designed in a way that different positions in the sequence will have distinct positional encodings. The positional encoding is element-wise added to the original [[input embedding|word embedding]] to create an enriched embedding that contains both semantic and positional information.
Positional encoding is crucial for transformer models to handle sequential data, such as text. It allows the models to process sequences while preserving the inherent advantages of self-attention mechanisms, enabling them to capture long-range dependencies and relationships in the data.
[[input embedding]] < [[Hands-on LLMs]]/[[2 LLMs and Transformers]] > [[learnius/llms/2 LLMs and Transformers/attention]]