In the context of [[machine learning]] and mathematics, a dot product (also known as an inner product or scalar product) is a mathematical operation that takes two vectors and produces a scalar (a single numerical value). The dot product is calculated by multiplying the corresponding components of the vectors and summing up the results. For two vectors, A and B, both having n components (or dimensions), the dot product is calculated as follows: Dot Product (A ⋅ B) = A₁ * B₁ + A₂ * B₂ + ... + Aₙ * Bₙ Where: - A₁, A₂, ..., Aₙ are the components of vector A. - B₁, B₂, ..., Bₙ are the components of vector B. The resulting dot product is a single number that indicates the similarity or alignment between the directions of the two vectors. If the vectors are pointing in the same direction, the dot product will be positive and larger. If they are pointing in opposite directions, the dot product will be negative. If they are orthogonal (perpendicular), the dot product will be zero. In machine learning and AI, the dot product is commonly used for various purposes, including: 1. **Calculating Similarity:** It can be used to measure the similarity between two vectors, which is important in tasks like recommendation systems, clustering, and information retrieval. 2. **Projection:** The dot product can be used to project a vector onto another vector, providing insight into how much of one vector's direction lies in the direction of the other. 3. **Weighted Sum:** In neural networks, the dot product is used in operations like matrix multiplication, where the elements of one matrix are multiplied with corresponding elements of another matrix and then summed up. 4. **Self-Attention:** As mentioned earlier, in transformer models, dot products are used to calculate attention scores between query and key vectors. The dot product is a fundamental mathematical operation that plays a significant role in various mathematical and AI concepts, particularly those involving vectors and vector spaces. [[attention score]] < [[Hands-on LLMs]]/[[2 LLMs and Transformers]] > [[softmax function]]