Abstract

Emotions express a person's internal state of being and it is reflected in the speech utterances. Emotions affect the time-domain characteristics of the speech signal, namely intonation patterns, speech rate, and short-term energy function. Conventional text-to-speech (TTS) systems are built to produce speech utterances for a given text, without any emotion, which can be called as neutral speech. Building a TTS system which can produce speech utterances with expected emotion is not a trivial task, in the sense that for each of the emotions, a separate speech corpus should be carefully collected and the system should be built. Therefore, the current work focuses on incorporating happiness into neutral speech using signal processing algorithms. In this regard, neutral and happy speech are analyzed and it is found that happiness can be perceived in certain emotive words in a sentence. Thus, in order to introduce happiness into neutral speech, these emotive keywords are identified and the above mentioned time-domain parameters are modified. Linear prediction-based synthesis of happy speech is initially performed. To improve the quality of the synthesized speech, TD-PSOLA is then used. Subjective evaluation yields a mean opinion score of 2.05 (out of a maximum of 3) for happy speech synthesized using linear prediction and 2.53 for those synthesized using TD-PSOLA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call