QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization

Shu-Tong Niu,Lei Sun,Chin-Hui Lee,Jun Du,Yu Hu

doi:10.1109/taslp.2023.3244513

Abstract

We improve iterative separation-based speaker diarization (ISSD) with quality-aware dynamic masking (QDM). We call the proposed framework QDM-SSD. Compared with ISSD, QDM-SSD enhances the simulated data used for model adaptation through QDM to alleviate the influence of errors in speaker priors. In addition to data quality purification, QDM-SSD also makes the adaptation data sparse by automatically adjusting speaker overlap ratios according to data quality. Furthermore, using a sliding window over the adaptation data, clean regions in speech segments can be better localized. Experiments on the two-speaker conversational telephone speech (CTS) corpus show that the proposed QDM-SSD framework can reduce the diarization error rate (DER) by 18.56% relatively compared with ISSD. Moreover, QDM-SSD is shown to generalize to other two-speaker non-conversation telephone speech data sets where ISSD fails to work. Finally, we demonstrate that QDM-SSD can serve as a front-end to improve the performances of back-end automatic speech recognition.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2023
Citations: 1

Similar Papers

Spoken term detection for OOV terms based on triphone confusion matrix
Yong Xu ... Shan Su
-
Yong Xu, et. al.Yong Xu ... Shan Su
01 Dec 2012
01 Dec 2012

Spoken keyword spotting via multi-lattice alignment
Hui Lin ... Jeff A Bilmes
-
Hui Lin, et. al.Hui Lin ... Jeff A Bilmes
22 Sep 2008
22 Sep 2008

Speech conversion from clean conditions to telephone ones
Guoyu Zuo ... Xiaogang Ruan
-
Guoyu Zuo, et. al. Guoyu Zuo ... Xiaogang Ruan
15 Jun 2004
15 Jun 2004

A Telephone Speech Corpus of China’s Minority languages for Automatic Language Identification
Jian Yang ... Yonghua Xu
-
Jian Yang, et. al.Jian Yang ... Yonghua Xu
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing