GMM and CNN Hybrid Method for Short Utterance Speaker Recognition

Zheli Liu,Tong Li,Zhendong Wu,Jin Li,Chao Shen

doi:10.1109/tii.2018.2799928

Abstract

During the last few years, the speaker recognition technique has been widely attractive for its extensive application in many fields, such as speech communications, domestics services, and smart terminals. As a critical method, the Gaussian mixture model (GMM) makes it possible to achieve the recognition capability that is close to the hearing ability of human in a long speech. However, the GMM is failing to recognize a short utterance speaker with a high accuracy. Aiming at solving this problem, in this paper, we propose a novel model to enhance the recognition accuracy of the short utterance speaker recognition system. Different from traditional models based on the GMM, we design a method to train a convolutional neural network to process spectrograms, which can describe speakers better. Thus, the recognition system gains the considerable accuracy as well as the reasonable convergence speed. The experiment results show that our model can help to decrease the equal error rate of the recognition from 4.9% to 2.5%.

Full Text