text normalization

In text-to-speech synthesis, the text normalization step is the process of converting written text into a standardized format that can be more easily analyzed. This includes handling [[non-standard word|non-standard words]] such as numbers, money amounts, dates, time, abbreviations, and other concepts that need to be converted into standard text. Although modern end-to-end TTS systems can perform certain normalization tasks on their own, the limited amount of training data usually leads to a separate normalization step. The separate normalization process can use a [[rule-based system]] or an [[encoder-decoder model]]. Rule-based normalization is done in two stages: tokenization and verbalization. The tokenization stage detects and performs the [[classification of non-standard words]] by identifying the [[semiotic class]] the word belongs to. With this information, the verbalization stage converts the tokens into standard words. An alternative approach is to utilize encoder-decoder models. These have been proven to be more effective for transduction tasks, but they do require expert-labeled training data where non-standard words have already been replaced with their appropriate verbalizations. Some languages have such training sets available. When using the simplest encoder-decoder setting, the problem is treated as a machine translation task.