In spoken language processing, speech synthesis is the task of generating artificial speech based on some form of message that expresses a communication intent in a process similar to the [[human speech production mechanism]].
![[heiga-zen-production.png]]
[Heiga Zen 2017](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45882.pdf)
When the message is expressed as a string of letters, the speech synthesis task is called [[text-to-speech synthesis (TTS)]].
Speech synthesis can be used for a variety of purposes, including creating voiceovers for videos, generating audio content for people with visual impairments, and providing text-to-speech capabilities for computer systems and mobile devices.
The speech synthesis task can have other forms of communication intent as input, such as a sequence of concepts generated by a machine or by picture boards in the scope of [[augmentative and alternative communication (AAC)]]. In this case, is common to use the term [[concept-to-speech (CTS)]].
A more recent task for speech synthesis is the production of intelligible speech from brain activity allowing people with severe impairment from neurological disorders to communicate more naturally. The term [[brain-to-speech]] is commonly used for this task.