Abstract

Speech is viewed as a combination of voiced and unvoiced regions. Voiced speech is produced due to vibration of the vocal cords. The vibrating pattern of vocal cords is different in different emotions. During production of some consonant sound units, vocal cords do not vibrate. Therefore, consonants are less effective for emotion generation in speech signal. In this paper, we have considered only vowel regions for emotion synthesis using three prosody parameters duration, intensity and pitch patterns. Vowel like regions (VLR) is identified using vowel onset and offset points. Onset and offset points are starting and ending points of the vowel like regions. It is observed that during emotional synthesis from neutral speech mainly vowel regions of speech utterance are modified significantly. Our experimental result shows that the emotion synthesis using only prosody modification of VLR is significantly better than emotion synthesis of prosody modification at syllable level and it is also very effective in time consideration. The average mean opinion score is calculated using only vowel level prosody modification. The average mean opinion scores for angry, happy and fear emotional speeches are 3.85, 3.60 and 4.03, respectively. These mean opinion scores are better than syllable level prosody modification which are 3.56, 3.17 and 3.92 for angry, happy and fear emotions, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.