Abstract

Pitch pulses were electronically derived from the utterances of three male native speakers of American English who each read eight neutral test sentences in certain “emotional” modes, i.e., as a question, an objective statement, a fearful utterance, a happy utterance, etc. A fixed-vowel POVO-type synthesizer was excited by these pitch pulses. The pitch perturbations, or rapid variations in the fundamental excitation rate, could be smoothed out and the POVO could be amplitude-modulated with a signal derived from the original speech envelope amplitude. Tapes were recorded and presented to separate groups of naive listeners who categorized the emotional modes in forced judgment tests. Results of the tests show that with unprocessed speech, the listeners were able to correctly identify the emotional content 85% of the time. When only pitch information was presented, correct identification was made 44% of the time. When amplitude information was added to the pitch information, the identification rose to 47%. Smoothing the pitch information with a 40-msec time constant reduced the identifications to 38%, while 100-msec smoothing reduced the identifications to 25%. A 120-cps monotone with amplitude information derived from the original speech envelope amplitude resulted in 14% identifications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call