Abstract

Previous studies about vocal-emotion recognition with noise-vocoded speech showed that temporal modulation cues provided by the temporal envelope play an important role in the perception of vocal emotion. To clarify the exact feature of temporal envelope that contributes to the perception of vocal emotion, a method based on the mechanism of modulation frequency analysis in the auditory system is necessary. In this study, auditory-based modulation spectral features were used to account for the perceptual data collected from vocal-emotion recognition experiments using noise-vocoded speech. At first, the modulation spectrogram of the emotional noise-vocoded speech was calculated by using an auditory-based modulation filterbank. Then, ten types of modulation spectral features were extracted from the modulation spectrograms. Finally, modulation spectral features and the perceptual data were compared to investigate the contribution of temporal envelope to the perception of vocal emotion with noise-vocoded speech. The results showed that there were high correlations between modulation spectral features and the perceptual data. Therefore, the modulation spectral features should be useful for accounting for the perceptual processing of vocal emotion with noise-vocoded speech. [Work supported by JSPS KAKENHI Grant Number JP. 17J08312, and Grant in Aid for Scientific Research Innovative Areas (No. 18H05004) from MEXT, Japan.]

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call