Abstract
This paper introduces a multimodal emotion recognition system based on two different modalities, i.e., affective speech and facial expression. For affective speech, the common low-level descriptors including prosodic and spectral audio features (i.e., energy, zero crossing rate, MFCC, LPC, PLP and temporal derivatives) are extracted, whereas a novel visual feature extraction method is proposed in the case of facial expression. This method exploits the displacement of specific landmarks across consecutive frames of an utterance for feature extraction. To this end, the time series of temporal variations for each landmark is analyzed individually for extracting primary visual features, and then, the extracted features of all landmarks are concatenated for constructing the final feature vector. The analysis of displacement signal of landmarks is performed by the discrete wavelet transform which is a widely used mathematical transform in signal processing applications. In order to reduce the complexity of derived models and improve the efficiency, a variety of dimensionality-reduction schemes are applied. Furthermore, to exploit the advantages of multimodal emotion recognition systems, the feature-level fusion of the audio and the proposed visual features is examined. Results of experiments conducted on three SAVEE, RML and eNTERFACE05 databases show the efficiency of proposed visual feature extraction method in terms of performance criteria.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Iranian Journal of Science and Technology, Transactions of Electrical Engineering
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.