Abstract

Depression, as a global mental health problem, is lacking effective screening methods that can help with early detection and treatment. This paper aims to facilitate the large-scale screening of depression by focusing on the speech depression detection (SDD) task. Currently, direct modeling on the raw signal yields a large number of parameters, and the existing deep learning-based SDD models mainly use the fixed Mel-scale spectral features as input. However, these features are not designed for depression detection, and the manual settings limit the exploration of fine-grained feature representations. In this paper, we learn the effective representations of the raw signals from an interpretable perspective. Specifically, we present a joint learning framework with attention-guided learnable time-domain filterbanks for depression classification (DALF), which collaborates with the depression filterbanks features learning (DFBL) module and multi-scale spectral attention learning (MSSA) module. DFBL is capable of producing biologically meaningful acoustic features by employing learnable time-domain filters, and MSSA is used to guide the learnable filters to better retain the useful frequency sub-bands. We collect a new dataset, the Neutral Reading-based Audio Corpus (NRAC), to facilitate the research in depression analysis, and we evaluate the performance of DALF on the NRAC and the public DAIC-woz datasets. The experimental results demonstrate that our method outperforms the state-of-the-art SDD methods with an F1 of 78.4% on the DAIC-woz dataset. In particular, DALF achieves F1 scores of 87.3% and 81.7% on two parts of the NRAC dataset. By analyzing the filter coefficients, we find that the most important frequency range identified by our method is 600–700Hz, which corresponds to the Mandarin vowels /e/ and /eˆ/ and can be considered as an effective biomarker for the SDD task. Taken together, our DALF model provides a promising approach to depression detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call