Depression recognition using a proposed speech chain model fusing speech production and perception features

Minghao Du,Shuang Liu,Tao Wang,Wenquan Zhang,Yufeng Ke,Long Chen,Dong Ming

doi:10.1016/j.jad.2022.11.060

Minghao Du, Shuang Liu + Show 5 more

Open Access

https://doi.org/10.1016/j.jad.2022.11.060

Copy DOI

Export

Save

Cite

Journal: Journal of Affective Disorders	Publication Date: Nov 30, 2022
Citations: 17	License type: cc-by-nc-nd

Affiliation: Tianjin University

Abstract
Full-Text
Similar Papers

Abstract

Listen

BackgroundIncreasing depression patients puts great pressure on clinical diagnosis. Audio-based diagnosis is a helpful auxiliary tool for early mass screening. However, current methods consider only speech perception features, ignoring patients' vocal tract changes, which may partly result in the poor recognition. MethodsThis work proposes a novel machine speech chain model for depression recognition (MSCDR) that can capture text-independent depressive speech representation from the speaker's mouth to the listener's ear to improve recognition performance. In the proposed MSCDR, linear predictive coding (LPC) and Mel-frequency cepstral coefficients (MFCC) features are extracted to describe the processes of speech generation and of speech perception, respectively. Then, a one-dimensional convolutional neural network and a long short-term memory network sequentially capture intra- and inter-segment dynamic depressive features for classification. ResultsWe tested the MSCDR on two public datasets with different languages and paradigms, namely, the Distress Analysis Interview Corpus-Wizard of Oz and the Multi-modal Open Dataset for Mental-disorder Analysis. The accuracy of the MSCDR on the two datasets was 0.77 and 0.86, and the average F1 score was 0.75 and 0.86, which were better than the other existing methods. This improvement reveals the complementarity of speech production and perception features in carrying depressive information. LimitationsThe sample size was relatively small, which may limit the application in clinical translation to some extent. ConclusionThis experiment proves the good generalization ability and superiority of the proposed MSCDR and suggests that the vocal tract changes in patients with depression deserve attention for audio-based depression diagnosis.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Depression recognition using a proposed speech chain model fusing speech production and perception features

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of Affective Disorders

Lead the way for us

Similar Papers

Children's Development of Self-Regulation in Speech Production
Ewen N Macdonald ... Paul Plante
Current Biology | VOL. 22
Ewen N Macdonald, et. al.Ewen N Macdonald ... Paul Plante
22 Dec 2011
Current Biology | VOL. 22

Functionally integrated neural processing of linguistic and talker information: An event-related fMRI and ERP study
Caicai Zhang ... William S-Y Wang
NeuroImage | VOL. 124
Caicai Zhang, et. al.Caicai Zhang ... William S-Y Wang
04 Sep 2015
NeuroImage | VOL. 124

Vocal tract changes caused by phonation into a tube: A case study using computer tomography and finite-element modeling
Tomáš Vampola ... Jaromír Horáček
The Journal of the Acoustical Society of America | VOL. 129
Tomáš Vampola, et. al.Tomáš Vampola ... Jaromír Horáček
01 Jan 2010
The Journal of the Acoustical Society of America | VOL. 129

Infant Vocal Tract Development Analysis and Diagnosis by Cry Signals with CNN Age Classification
Chunyan Ji ... Yi Pan
-
Chunyan Ji, et. al.Chunyan Ji ... Yi Pan
13 Oct 2021
13 Oct 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Depression recognition using a proposed speech chain model fusing speech production and perception features

Abstract

Published Version

Talk to us

Similar Papers

More From: Journal of Affective Disorders