Abstract

The clinical diagnosis of major depressive disorder (MDD) relying heavily on the subjective judgment assisted by questionnaires, could result in a low detection rate of MDD. Automatic depression detection (ADD) technology based on physiological and psychological information provides an objective and quantitative way for MDD detection. As a useful data modality, audio signals have attracted increasing interest in mental disorder detection. However, most recent audio-based depression detection methods underestimate the importance of subtly organizing audio data. They either simply use equally split audio segments, or directly build the model upon the entire data sequence, which pose challenges to learning task-specific features. To address this issue, we propose to reorganize the audio data at response level. Based on that, we construct a novel end-to-end model that hierarchically learns discriminative features for accurate depression detection. The stages of intraresponse fusion and interresponse fusion facilitate the extraction and aggregation of ADD-specific information from multiple kinds of acoustic features. Experimental results show that our proposed method significantly outperforms other state-of-the-art audio-based methods. In addition, the flexibility and the robustness of our model are also validated.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call