Abstract

Speech has been in use as an effective medium in human machine interactions for a long time. A conventional text-to-speech (TTS) system produces monotonous speech without any appropriate emotion. The inclusion of emotions in such synthesis systems will not only result in expressive speech but also reduce the monotony of the synthetic speech. The time domain parameters of speech signals, like short time energy, duration, and pitch contour are influenced by emotions. Hence to incorporate desired emotion into neutral speech, signal processing methods are used in this work for modifying the prosodic speech parameters in time domain, either in few words or the entire speech utterance. An initial analysis is performed, by comparing neutral speech with happy and sad speech. Based on the observations from the analysis, the parameters of the speech signal are varied using TD-PSOLA technique. The parameters, short time energy, duration, and pitch contour in the neutral speech are modified and further analyzed quantitatively to decide the combination of parameter modification that better synthesizes the emotional speech from neutral speech.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call