Abstract

Automatic speaker verification (ASV) faces an unprecedented problem due to mask-wearing speakers, a consequence of COVID-19. Masked speakers unconsciously alter their normal speaking styles to compensate for the mask transfer effect, changing the statistical distribution of speech and resulting in a domain mismatch that can be dealt with by domain adaptation (DA) algorithms. However, most DA algorithms align speaker embedding distributions to reduce domain shift, which inevitably reduces the discriminative power of speaker representations for the target domain. To improve the adaptation performance, we propose a self-supervised learning-based domain regularization (SDR) algorithm that focuses on the discriminative class structure by removing the domain effect. Different from most DA algorithms, SDR is a co-training structure with two modules, which uses a self-supervised learning strategy to explore domain variables and discriminative speaker representations in the target domain. We collected speech data from 271 mask-wearing speakers, and carried out ASV experiments, whose results showed that the proposed SDR improved performance by 10%–20% based on the equal error rate (EER), compared with state-of-the-art adversarial DA frameworks.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.