Unsupervised Adaptation of Deep Speech Activity Detection Models to Unseen Domains

Pablo Gimeno,Alfonso Ortega,Eduardo Lleida,Dayana Ribas,Antonio Miguel

doi:10.3390/app12041832

Pablo Gimeno, Alfonso Ortega + Show 3 more

Open Access

https://doi.org/10.3390/app12041832

Copy DOI

Journal: Applied Sciences	Publication Date: Feb 10, 2022
Citations: 1	License type: CC BY 4.0

Affiliation: Universidad de Zaragoza

Abstract

Speech Activity Detection (SAD) aims to accurately classify audio fragments containing human speech. Current state-of-the-art systems for the SAD task are mainly based on deep learning solutions. These applications usually show a significant drop in performance when test data are different from training data due to the domain shift observed. Furthermore, machine learning algorithms require large amounts of labelled data, which may be hard to obtain in real applications. Considering both ideas, in this paper we evaluate three unsupervised domain adaptation techniques applied to the SAD task. A baseline system is trained on a combination of data from different domains and then adapted to a new unseen domain, namely, data from Apollo space missions coming from the Fearless Steps Challenge. Experimental results demonstrate that domain adaptation techniques seeking to minimise the statistical distribution shift provide the most promising results. In particular, Deep CORAL method reports a 13% relative improvement in the original evaluation metric when compared to the unadapted baseline model. Further experiments show that the cascaded application of Deep CORAL and pseudo-labelling techniques can improve even more the results, yielding a significant 24% relative improvement in the evaluation metric when compared to the baseline system.

Highlights

Speech Activity Detection (SAD) aims to determine whether an audio signal contains speech or not, and its exact location in the signal
Inspired by our previous experiences participating in the Fearless Steps Challenge [23,24], that introduced a new audio domain in the research community, in this paper we aim to explore unsupervised domain adaptation techniques in the context of the SAD task
In this Figure, we present the detection error trade-off (DET) curve and equal error rate (EER) for some of the best performing knowledge distillation systems compared to the unadapted baseline system

Summary

Introduction

Speech Activity Detection (SAD) aims to determine whether an audio signal contains speech or not, and its exact location in the signal. This constitutes an essential preprocessing step in several speech-related applications such as speech and speaker recognition, as well as speech enhancement. SAD is used as a preliminary block to separate the segments of the signal that contain speech from those that are only noise. This way, enabling the overall system to process only the speech segments. In [9], new optimisation techniques based on the area under the ROC curve are explored in the framework of a deep learning SAD system

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Unsupervised Adaptation of Deep Speech Activity Detection Models to Unseen Domains

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Generalizing Deep Learning for Medical Image Segmentation to Unseen Domains via Deep Stacked Transformation.
Ling Zhang ... Thomas Sanford
IEEE transactions on medical imaging | VOL. 39
Ling Zhang, et. al.Ling Zhang ... Thomas Sanford
12 Feb 2020
IEEE transactions on medical imaging | VOL. 39

Optimization of RNN-Based Speech Activity Detection
Gregory Gelly ... Jean-Luc Gauvain
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26
Gregory Gelly, et. al.Gregory Gelly ... Jean-Luc Gauvain
01 Mar 2018
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 26

Distributionally robust unsupervised domain adaptation
Yibin Wang ... Haifeng Wang
Journal of Computational and Applied Mathematics | VOL. 436
Yibin Wang, et. al.Yibin Wang ... Haifeng Wang
07 Jun 2023
Journal of Computational and Applied Mathematics | VOL. 436

Speech activity detection for NASA apollo space missions: challenges and solutions
Ali Ziaei ... Douglas W Oard
-
Ali Ziaei, et. al.Ali Ziaei ... Douglas W Oard
14 Sep 2014
14 Sep 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised Adaptation of Deep Speech Activity Detection Models to Unseen Domains

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences