The most common way to evaluate speech synthesis systems is by using the opinion of human listeners. Given the subjectivity of such assessment, a method that relies on human judgment is called a [[subjective test]].
The two most common [[subjective test|subjective tests]] to evaluate the quality of synthesized speech are:
- [[mean opinion score (MOS)]], where the human listeners rate each sentence on a 1 to 5 scale.
- [[AB test]], where human listeners choose which of two utterances they prefer.
Another approach for the evaluation of TTS systems is the use of one or more [[objective test]]s. Some commonly used objective tests include [[perceptual evaluation of speech quality (PESQ)]], [[mel cepstral distortion (MCD)]], and [[word error rate (WER)]]. Finding a reliable automatic metric for evaluation is still an ongoing and stimulating area of research.