Pitch and MFCC dependent GMM models for speaker identification systems

H Ezzaidi,J Rouat

doi:10.1109/ccece.2004.1344954

Abstract

Recently, we proposed an approach to speaker identification which jointly exploits vocal tract and glottis source information. The approach synchronously takes into account the correlation between the two sources of information. The proposed theoretical model, which uses a joint law, is presented. Some restrictions and simplifications are taken into account to show the significance of this approach in practical way. The fundamental frequency and MFCCs (Mel frequency cepstrum coefficients) are used to represent the information of the source and the vocal tract, respectively. The probability density of the source, in particular, was considered to obey a uniform law. Tests were carried out with only female speakers from a speech telephony database (SPIDRE) recorded from various telephone handsets. It is proposed to model the source information by a Gaussian mixture model (GMM) rather than the uniform probabilistic model. Tests were extended to all speakers of the SPIDRE database; four systems were proposed and compared. The first is a baseline system based on the MFCC and does not use any information from the source. The second examines only the voiced segments of the vocal signal. The last two relate to the suggested approaches according to the two techniques. The source information is found to follow a normal distribution in one technique and a log normal distribution in the other. With the proposed approach, the gain in performance is 10.5% for women, 7% for men and 8% for all speakers.

Full Text