Abstract
We improve iterative separation-based speaker diarization (ISSD) with quality-aware dynamic masking (QDM). We call the proposed framework QDM-SSD. Compared with ISSD, QDM-SSD enhances the simulated data used for model adaptation through QDM to alleviate the influence of errors in speaker priors. In addition to data quality purification, QDM-SSD also makes the adaptation data sparse by automatically adjusting speaker overlap ratios according to data quality. Furthermore, using a sliding window over the adaptation data, clean regions in speech segments can be better localized. Experiments on the two-speaker conversational telephone speech (CTS) corpus show that the proposed QDM-SSD framework can reduce the diarization error rate (DER) by 18.56% relatively compared with ISSD. Moreover, QDM-SSD is shown to generalize to other two-speaker non-conversation telephone speech data sets where ISSD fails to work. Finally, we demonstrate that QDM-SSD can serve as a front-end to improve the performances of back-end automatic speech recognition.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.