Abstract

An emotion is made of several components such as physiological changes in the body, subjective feelings and expressive behaviors. These changes in speech signal are mainly observed in prosody parameters such as pitch, duration and energy. Hindi language is mostly syllabic in nature. Syllables are the most suitable basic units for the analysis and synthesis of speech. Therefore, vowel onset point detection method is used to segment the speech utterance into syllable like units. In this work, prosody parameters are modified using instants of significant excitation (epochs) and these instants are detected using zero frequency filtering-based method. Epoch locations in the voiced speech correspond to instants of glottal closure, and in the unvoiced region, they correspond to some random instants of significant excitation. Anger, happiness and sadness emotions are considered as target emotions in the proposed emotion conversion framework. Feedforward neural network models are explored for mapping the prosodic parameters between neutral and target emotions. Predicted prosodic parameters of the target emotion are incorporated into neutral speech at syllable level to produce the desired emotional speech. After incorporating the emotion-specific prosody, perceptual quality of the transformed speech is evaluated by subjective tests.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call