token - learnius

In the context of [[large language model|LLM]]s, a token is a basic input unit of a [[transformer model]]. Tokens can be characters, words, sub-words, or control tags like `<EOS>`. The set of all tokens is called the vocabulary and, contrary to a natural language, most [[large language model|LLM]] systems have a fixed-size vocabulary. [[transformer model]] < [[Hands-on LLMs]]/[[2 LLMs and Transformers]] > [[input embedding]]