Ensemble Additive Margin Softmax for Speaker Verification

Ya-Qi Yu,Wu-Jun Li,Lei Fan

doi:10.1109/icassp.2019.8683649

Abstract

End-to-end speaker embedding systems have shown promising performance on speaker verification tasks. Traditional end-to-end systems typically adopt softmax loss as training criterion, which is not strong enough for training discriminative models. In this paper, we adapt the additive margin softmax (AM-Softmax) loss, which is originally proposed for face verification, to speaker embedding systems. Furthermore, we propose a novel ensemble loss, called ensemble additive margin softmax (EAM-Softmax) loss, for speaker embedding by integrating Hilbert-Schmidt independence criterion (HSIC) into the speaker embedding system with the AM-Softmax loss. Experiments on a large-scale dataset VoxCeleb show that AM-Softmax loss is better than traditional loss functions, and approaches using EAM-Softmax loss can outperform existing speaker verification methods to achieve state-of-the-art performance.

Full Text