grapheme-to-phoneme (G2P) conversion

Converting written characters into spoken sounds can greatly aid in creating synthetic speech. For instance, the word "speech" can be transformed into the sequence of [[phoneme]]s "s p iy ch" using the [[ARPAbet]] phonetic alphabet. The grapheme-to-phoneme conversion is typically based on a manually compiled dictionary of character-to-sound correspondences. However, this dictionary cannot encompass all possible words for languages like English that use an alphabet. As a result, G2P (grapheme-to-phoneme) conversion includes the capability to generate pronunciations for unfamiliar words. This can be achieved by hand-crafted rules or by a statistical model trained on a large dictionary. Some languages, such as Portuguese, may require sound changes in word boundaries depending on the adjacent words (Sandhi). In contrast, for languages like Chinese that use characters instead of letters, while the database may cover most characters, many characters have multiple possible sounds that depend on context. In this case, G2P conversion serves mainly to disambiguate such polyphones and determine the appropriate pronunciation based on the surrounding text.