A text-independent speaker verification system using support vector machines classifier

Yong Gu,Trevor Thomas

doi:10.21437/eurospeech.2001-413

Abstract

Abstract In the recent years the technology for speaker verification orcall authentication has received an increasing amount ofattention in IVR industry. However due to the complexity ofspeaker information embedded in the speech signals thecurrent technology still can not produce the verificationaccuracy to meet the requirement for some applications. In thispaper we introduce a new pattern classification approach,support vector machines (SVM) for the text-independentspeaker verification. The SVM is a new way of statisticallearning based on a principle of structural risk minimisation.In the paper various evaluation results for the SVMverification system are presented and a comparison with abaseline GMM approach is also given. The results demonstratethat the SVM approach perform much better than the GMMapproach. On the same training and testing data set the SVMapproach gives an EER 1.2% versus 3.9% EER from theGMM approach. 1. Introduction In the recent years the technology for speaker verification(SV) or call authentication has received an increasing amountof attention in IVR industry. However due to the complexityof speaker information embedded in the speech signals thecurrent technology such as HMM, GMM, ANN etc. still cannot produce the verification accuracy to meet the requirementfor some applications. In this paper we introduce a newapproach support vector machines (SVM) for this problem.The SVM is a learning technique introduced by V. Vapnik [1].It can be seen as a new way to statistical learning based on aprinciple of structural risk minimisation. An explicit noisedescription in the approach and the possibility of using non-linear kernel in the dual representation makes this method veryattractive in many pattern recognition areas. The technique hasbeen applied in the area of computer vision and others.Recently some works have shown that the algorithm canachieve better phoneme classification accuracy than someconventional methods for speech processing [2][3]. This paperpresents a text-independent SV system using the SVMapproach. In the paper some alternatives in the kernelfunctions and decision functions are discussed and evaluationresults are presented. The Gaussian mixture model (GMM)technique, one of most popular approach for the text-independent SV, is used as a baseline in our evaluations. Acomparison between the SVM and the GMM approach isgiven in the paper. Results demonstrate that the SVMalgorithm perform much better than the baseline GMM. Onthe same training and testing data set the SVM approach givesan equal error rate (EER) 1.2% versus 3.9% from the GMM.

Full Text