A diphone is a sequence of two adjacent [[phone|phones]] in speech. It is the smallest unit of speech that can be used to synthesize natural-sounding speech in a text-to-speech system using [[concatenative synthesis]]. By combining diphones together, the system can generate a wide range of different sounds and words. The diphone spans from the steady state of one phone to the steady state of the next, capturing the acoustic signal changes during the transition. ![[diphones.png]] Example of diphone segmentation from Mihkla et. al. (1999), (2) shows the phone boundaries (4) shows the separated diphones saved in the diphone database. ## References Joseph Olive. Rule synthesis of speech from dyadic units. In ICASSP’77. IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages 568–570. IEEE, 1977. Mihkla, Meelis & Eek, Arvo & Meister, Einar. (1999). Text-to-speech synthesis of estonian. 10.21437/Eurospeech.1999-465.