Improving depression prediction using a novel feature selection algorithm coupled with context-aware analysis

Zhijun Dai,Heng Zhou,Qingfang Ba,Yang Zhou,Lifeng Wang,Guochen Li

doi:10.1016/j.jad.2021.09.001

Abstract

Background: Developing machine learning based depression prediction method with information from long-term recordings is important and challenging to clinical diagnosis of depression.Methods: We developed a novel two-stage feature selection algorithm conducted on the high-dimensional (over thirty thousand) features constructed by a context-aware analysis on the data set of DAIC-WOZ, including audio, video, and semantic features. The prediction performance was compared with seven reference models. The preferred topics and feature categories related to the retained features were also analyzed respectively.Results: Parsimonious subsets (tens of features) were selected by the proposed method in each case of prediction. We obtained the best performance in depression classification with F1-score as 0.96 (0.67), Precision as 1.00 (0.63), and Recall as 0.92 (0.71) on the development set (test set). We also achieved promising results in depression severity estimation with RMSE as 4.43 (5.11) and MAE as 3.22 (3.98), having a marginal difference with the best reference model (random forest with ‘Selected-Text’ features). Five most important topics related to depression were revealed. The audio features were predominant to the other feature categories in depression classification while the contributions of the three feature categories to severity estimation were almost equal.Limitations: More depression samples in the database we used should be further included. The second stage of feature selection is relatively time-consuming.Conclusion: This pipeline of depression recognition as well as the preferred topics and feature categories are expected to be useful in supporting the diagnosis of psychological distress conditions.

Full Text