Privacy-Preserving Deep Speaker Separation for Smartphone-Based Passive Speech Assessment.

Apiwat Ditthapron,Emmanuel O Agu,Adam C Lammert

doi:10.1109/ojemb.2021.3063994

Apiwat Ditthapron, Emmanuel O Agu + Show 1 more

Open Access

https://doi.org/10.1109/ojemb.2021.3063994

Copy DOI

Abstract

Goal: Smartphones can be used to passively assess and monitor patients’ speech impairments caused by ailments such as Parkinson’s disease, Traumatic Brain Injury (TBI), Post-Traumatic Stress Disorder (PTSD) and neurodegenerative diseases such as Alzheimer’s disease and dementia. However, passive audio recordings in natural settings often capture the speech of non-target speakers (cross-talk). Consequently, speaker separation, which identifies the target speakers’ speech in audio recordings with two or more speakers’ voices, is a crucial pre-processing step in such scenarios. Prior speech separation methods analyzed raw audio. However, in order to preserve speaker privacy, passively recorded smartphone audio and machine learning-based speech assessment are often performed on derived speech features such as Mel-Frequency Cepstral Coefficients (MFCCs). In this paper, we propose a novel Deep MFCC bAsed SpeaKer Separation (Deep-MASKS). Methods: Deep-MASKS uses an autoencoder to reconstruct MFCC components of an individual’s speech from an i-vector, x-vector or d-vector representation of their speech learned during the enrollment period. Deep-MASKS utilizes a Deep Neural Network (DNN) for MFCC signal reconstructions, which yields a more accurate, higher-order function compared to prior work that utilized a mask. Unlike prior work that operates on utterances, Deep-MASKS operates on continuous audio recordings. Results: Deep-MASKS outperforms baselines, reducing the Mean Squared Error (MSE) of MFCC reconstruction by up to 44% and the number of additional bits required to represent clean speech entropy by 36%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE open journal of engineering in medicine and biology	Publication Date: Jan 1, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Privacy-Preserving Deep Speaker Separation for Smartphone-Based Passive Speech Assessment.

Abstract

Talk to us

Similar Papers

More From: IEEE open journal of engineering in medicine and biology

Lead the way for us

Similar Papers

PTSD and Combat-Related Injuries: Functional Neuroanatomy
K H Taber ... R A Hurley
Journal of Neuropsychiatry | VOL. 21
K H Taber, et. al.K H Taber ... R A Hurley
01 Feb 2009
Journal of Neuropsychiatry | VOL. 21

Real-time prediction of upcoming respiratory events via machine learning using snoring sound signal.
Bochun Wang ... Ji Wu
Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine | VOL. 17
Bochun Wang, et. al.Bochun Wang ... Ji Wu
12 Apr 2021
Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine | VOL. 17

Posttraumatic Stress Disorder Symptoms During the First Six Months After Traumatic Brain Injury
C H Bombardier ... J R Fann
Journal of Neuropsychiatry | VOL. 18
C H Bombardier, et. al.C H Bombardier ... J R Fann
01 Nov 2006
Journal of Neuropsychiatry | VOL. 18

Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN
U Kumaran ... Senthil Murugan Nagarajan
International Journal of Speech Technology | VOL. 24
U Kumaran, et. al.U Kumaran ... Senthil Murugan Nagarajan
13 Jan 2021
International Journal of Speech Technology | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Privacy-Preserving Deep Speaker Separation for Smartphone-Based Passive Speech Assessment.

Abstract

Talk to us

Similar Papers

More From: IEEE open journal of engineering in medicine and biology