Queries are also derived from the input embeddings and represent the word for which the attention distribution is being computed. In the context of [[self-attention]], each word in the sequence serves as a query to gather information from other words in the same sequence.
In [[transformer model]]s, a query value is computed for each element (words) of the input sentence.
[[self-attention]] < [[Hands-on LLMs]]/[[2 LLMs and Transformers]] > [[key]]