Abstract
In recent years, research has found that the impact of depression status primarily lies in patients’ language expression and facial expressions. Furthermore, facial expressions and intonation in speech exhibit a natural coexistence, making facial and vocal information core recognition indicators in depression identification. It is imperative to explore the effective use of deep learning methods for multimodal depression detection. We have proposed a novel trilateral bimodal encoding model (MEN), attentional decision fusion (ADF), and feature extraction fusion strategy. We employed a hybrid fusion approach that combines early intra-modality fusion with late inter-modality fusion, for multimodal depression diagnosis. In the feature extraction fusion component, we combine different representations of the same modality before inputting them into the network for training, enhancing features relevant to depression in the data. Through our multimodal encoding network, we extract frame-level information using Convolutional Neural Networks (CNN) while considering long-term context information and dependencies with Bidirectional Long Short-Term Memory (BiLSTM). Finally, the three streams of information were effectively integrated through attention fusion representation in our Attention Decision Fusion module (ADF), for depression score regression prediction. Extensive experiments were conducted on two public datasets, AVEC2013 and AVEC2014. The average absolute error/ root mean squared error (MAE/RMSE) scores for predicting depression scores were 6.48/8.91 and 7.01/9.38, respectively. This demonstrated that our hybrid fusion method outperforms traditional early or late fusion methods in terms of performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.