2 LLMs and Transformers

This section provides a primer on the fundamental concepts underlying transformers and large language models. We aim to facilitate a foundational understanding of these advanced technologies by introducing these key notions. ## Concepts - [[large language model]] - [[transformer model]] - [[token]] - [[input embedding]] - [[positional encoding]] - [[attention]] - [[self-attention]] - [[query]] - [[key]] - [[value]] - [[attention score]] - [[dot product]] - [[softmax function]] - [[logit]] - [[attention head]] - [[multi-head self-attention]] - [[residual connection]] - [[transformer encoder]] - [[transformer decoder]] - [[masked multi-head self-attention]] - [[encoder-decoder attention]] - [[transformer output layer]] - [[prompt]] - [[hallucination]] - [[base LLM]] - [[instruction-tuned LLM]] [[1 Machine Learning Basics]] < [[Hands-on LLMs]] > [[3 Instruction-Tuned LLM Systems]]