Abstract

Depression is a severe psychological condition that affects millions of people worldwide. As depression has received more attention in recent years, it has become imperative to develop automatic methods for detecting depression. Although numerous machine learning methods have been proposed for estimating the levels of depression via audio, visual, and audiovisual emotion sensing, several challenges still exist. For example, it is difficult to extract long-term temporal context information from long sequences of audio and visual data, and it is also difficult to select and fuse useful multi-modal information or features effectively. In addition, how to include other information or tasks to enhance the estimation accuracy is also one of the challenges. In this study, we propose a multi-modal adaptive fusion transformer network for estimating the levels of depression. Transformer-based models have achieved state-of-the-art performance in language understanding and sequence modeling. Thus, the proposed transformer-based network is utilized to extract long-term temporal context information from uni-modal audio and visual data in our work. This is the first transformer-based approach for depression detection. We also propose an adaptive fusion method for adaptively fusing useful multi-modal features. Furthermore, inspired by current multi-task learning work, we also incorporate an auxiliary task (depression classification) to enhance the main task of depression level regression (estimation). The effectiveness of the proposed method has been validated on a public dataset (AVEC 2019 Detecting Depression with AI Sub-challenge) in terms of the PHQ-8 scores. Experimental results indicate that the proposed method achieves better performance compared with currently state-of-the-art methods. Our proposed method achieves a concordance correlation coefficient (CCC) of 0.733 on AVEC 2019 which is 6.2% higher than the accuracy (CCC = 0.696) of the state-of-the-art method.

Highlights

  • Depression is one of the most severe mental disorders worldwide, which can severely affect people’s daily lives

  • To investigate the effect of transformer-based networks on the correlation coefficient (CCC) scores and root mean square error (RMSE) values, we use single features as inputs of the networks, and the task is set to PHQ-8 regression

  • The experimental results indicated that the use of the transformer model for depression detection can improve the final prediction performance

Read more

Summary

Introduction

Depression is one of the most severe mental disorders worldwide, which can severely affect people’s daily lives. There are still many debates about the causes of depression, but the three main causes are psychological, genetic, and social and environmental problems. As numerous people suffer from depression worldwide, it has become important to detect depression using computer-aided methods. The detection of depression is a challenging task as many of its symptoms are widely distributed in ages, gender, regions, and cultures. The limitations in the detection of depression using computer-aided methods necessitate the development of more effective methods to accomplish this task. Numerous approaches have been proposed to automatically detect the levels of depression

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call