Abstract

Depression is one of the most prevalent mental disorders, which seriously affects one’s life. Traditional depression diagnostics commonly depend on rating with scales, which can be labor-intensive and subjective. In this context, automatic depression detection (ADD), aiming to assist medical experts in their diagnosis and analysis, has been attracting more attention for its better objectivity and fewer laborious interventions. A typical ADD model detects depression via automatically extracting task-specific features from medical records, such as video sequences, and sending them into a classifier for assistive prediction. However, it remains challenging to effectively extract depression-specific information from long sequences, thereby hindering a satisfying accuracy. In this article, we propose a novel ADD method via learning and fusing features from visual cues. Specifically, we first construct temporal dilated convolutional network (TDCN), in which multiple dilated convolution blocks (DCBs) are designed and stacked, to learn the long-range temporal information from sequences. Then, the featurewise attention (FWA) module is adopted to fuse different features extracted from TDCNs. The module learns to assign weights for the feature channels, aiming to better incorporate different kinds of visual features and further enhance the detection accuracy. Our method achieves the state-of-the-art performance on the Distress Analysis Interview Corpus Wizard-of-Oz (DAIC_WOZ) dataset compared with other visual-feature-based methods, showing its effectiveness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call