Abstract

In our Indonesian concatenative speech synthesis system, there existed a linking issue between vowel-starting-syllable with its preceding phoneme. Unlike other languages, in Indonesian the power of the speech signal decreases (sometime it is a pause) in this kind of boundary. Phonemes before and after the boundary are uttered separately. For example, the underlined phonemes before and after the boundary in “buah apel (apple fruit)” are not to be uttered continuously, but should be separated. This is not the case in English, where the underlined phonemes in “an apple” are linked. We did not treat this kind of low-power event (“lpow”) explicitly, such that the lpow generated indirectly from the syllable boundary information, is sometime too short, resulting in the above linking issue. In this paper, we propose to explicitly treat the lpow. The lpow is treated similarly with phoneme during the model training, so that it is appropriately generated during the synthesis. We confirmed that the synthesized speech is more natural by the introduction of lpow.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.