transformer encoder

The encoder part of the [[transformer model]]'s architecture is the initial stage of processing the input data (usually a sequence of tokens, such as words in a sentence) to create meaningful representations that can be used for further computations and tasks. The encoding process involves converting the input tokens into numerical embeddings and incorporating positional information to enable the model to understand the order of the tokens. The first two components of the encoding part of a transformer are the [[input embedding]] layer and the [[positional encoding]]. The combined output of the embedding layer and positional encoding provides the initial input for the subsequent layers of the transformer model, the [[multi-head self-attention]], and the [[feedforward neural network]] layers. These layers then process the encoded input representations to capture relationships, patterns, and dependencies within the data. Both of those layers include [[residual connection|residual connections]]. The encoding part of a transformer model is crucial for understanding the context and structure of the input sequence. It enables the model to process sequential data in parallel, efficiently capturing long-range dependencies and interactions between different parts of the sequence. [[residual connection]] < [[Hands-on LLMs]]/[[2 LLMs and Transformers]] > [[transformer decoder]]