Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement

Vijay Ravi,Jinhan Wang,Jonathan Flint,Abeer Alwan

doi:10.1016/j.csl.2023.101605

Abstract

Speech signals are valuable biomarkers for assessing an individual’s mental health, including identifying Major Depressive Disorder (MDD) automatically. A frequently used approach in this regard is to employ features related to speaker identity, such as speaker-embeddings. However, over-reliance on speaker identity features in mental health screening systems can compromise patient privacy. Moreover, some aspects of speaker identity may not be relevant for depression detection and could serve as a bias factor that hampers system performance. To overcome these limitations, we propose disentangling speaker-identity information from depression-related information. Specifically, we present four distinct disentanglement methods to achieve this — adversarial speaker identification (SID)-loss maximization (ADV), SID-loss equalization with variance (LEV), SID-loss equalization using Cross-Entropy (LECE) and SID-loss equalization using KL divergence (LEKLD). Our experiments, which incorporated diverse input features and model architectures, have yielded improved F1 scores for MDD detection and voice-privacy attributes, as quantified by Gain in Voice Distinctiveness (GVD) and De-Identification Scores (DeID). On the DAIC-WOZ dataset (English), LECE using ComparE16 features results in the best F1-Scores of 80% which represents the audio-only SOTA depression detection F1-Score along with a GVD of −1.1 dB and a DeID of 85%. On the EATD dataset (Mandarin), ADV using raw-audio signal achieves an F1-Score of 72.38% surpassing multi-modal SOTA along with a GVD of −0.89 dB dB and a DeID of 51.21%. By reducing the dependence on speaker-identity-related features, our method offers a promising direction for speech-based depression detection that preserves patient privacy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement

Abstract

Talk to us

Similar Papers

More From: Computer speech & language

Lead the way for us

Journal: Computer speech & language	Publication Date: Dec 26, 2023
Citations: 6

Similar Papers

Significance of Prosody Modification in Privacy Preservation on speaker verification
Ayush Agarwal ... S R Mahadeva Prasanna
Control theory & applications | VOL. -
Ayush Agarwal, et. al.Ayush Agarwal ... S R Mahadeva Prasanna
24 May 2022
Control theory & applications | VOL. -

Enhancement in Speaker Recognition using SincNet through Optimal Window and Frame Shift
Banala Saritha ... Nirupam Shome
-
Banala Saritha, et. al.Banala Saritha ... Nirupam Shome
24 Jun 2022
24 Jun 2022

<title>Generalized dimensions applied to speaker identification</title>
Limin Hou ... Shuozhong Wang
-
Limin Hou, et. al.Limin Hou ... Shuozhong Wang
25 Aug 2004
25 Aug 2004

Speaker Identification Using Instantaneous Frequencies
M Grimaldi ... F Cummins
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 16
M Grimaldi, et. al.M Grimaldi ... F Cummins
01 Aug 2008
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing accuracy and privacy in speech-based depression detection through speaker disentanglement

Abstract

Talk to us

Similar Papers

More From: Computer speech & language