Abstract

Continuous affect recognition has a huge potential in human computer interaction applications. How to efficiently fuse speech and facial information for inferring the affective state of a person from data captured in real-world conditions is a very important issue for continuous affect recognition. Currently, late fusion is usually used in multi-modal continuous affect recognition to improve system performance. However, late fusion ignores the complementarity and redundancy between multiple streams from the different modalities. In this work, we propose an efficient model-level fusion approach for audiovisual continuous affect recognition.First, we propose a LSTM based model-level fusion approach for audiovisual continuous affect recognition. Our approach considers the complementarity and redundancy between multiple streams from different modalities. In addition, our model can efficiently incorporate side information such as gender using adaptive weight network. At last, we design a deep supervision based effective optimization strategy for training the proposed audiovisual continuous affect recognition model.We demonstrate the effectiveness of our approach on the RECOLA dataset. Our experimental results show that the proposed adaptive weight network improves the performance compared to a plain neural network without adaptive weights. Our approach obtains remarkable improvements on both arousal and valence in terms of concordance correlation coefficient (CCC) compared to state-of-the-art early fusion and model-level fusion approaches. Therefore, we believe that our proposed approach gives a promising direction for further improving the performance of audiovisual continuous affect recognition.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.