Abstract

In many i-vector based speaker recognition frameworks, the key challenge is to develop effective channel compensation methods for enlarging inter-class differences while reducing intra-class variations. This challenge is handled with a discriminatively learned network (DLN), which uses both speaker classification and verification signals as supervision. The speaker classification task forces the embeddings (vectors mapped from i-vectors) of different identities drawing apart to increase the inter-class variation, while the verification task pulls the embeddings of the same identity together to reduce the intra-class variation. DLN projects i-vectors to a more discriminative embedding space. However, the verification scores are cosine similarities between these embeddings. The learned DLN can be well generalised to new speakers unseen in the training data. On the text-dependent challenging Robust Speaker Recognition (RSR2015) database, the performance is significantly improved when compared with the linear discriminant analysis (LDA) and Gaussian probabilistic LDAmethods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call