Abstract

One of the most widely used self-supervised speaker verification system training methods is to optimize the speaker embedding network in a discriminative fashion using clustering algorithm-driven Pseudo-Labels. Although the pseudo-labels-based self-supervised training scheme showed impressive performance, recent studies have shown that label noise can significantly impact performance. In this paper, we have explored various pseudo-labels driven by different clustering algorithms and conducted a fine-grained analysis of the relationship between the quality of the pseudo-labels and the speaker verification performance. Experimentally, we shed light on several previously overlooked aspects of the pseudo-labels that can impact speaker verification performance. Moreover, we could observe that the self-supervised speaker verification performance is heavily dependent on multiple qualitative aspects of the clustering algorithms used to generate the pseudo-labels. Furthermore, we show that speaker verification performance can be severely degraded from overfitting the noisy pseudo-labels and that the mixup strategy can mitigate the memorization effects of label noise.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call