Fusion features for robust speaker identification

Ines Ben Fredj,Youssef Zouhir,Kaïs Ouni

doi:10.1504/ijsise.2018.10013027

Abstract

Speaker's identification systems aim to identify, through a set of speech parameters, the speaker's identity. Thus, a relevant speech representation is required. For this purpose, we suggest to combine spectral parameters as the Mel frequency Cepstral coefficients (MFCC) and the perceptual linear predictive (PLP) coefficients and prosodic parameter such as the signal fundamental frequency (F0). There are two main classes for F0 estimation divided into temporal and spectral methods. We employ the sawtooth waveform inspired pitch estimator (SWIPE) algorithm for F0 estimation. It is based on the pitch estimation in the frequency domain. In addition, we evaluate the Gaussian mixture model-universal background model (GMM-UBM) for the modelling purpose. Experiments are involved in Timit database. Identification rates are promising and prove the benefit of the combination for MFCC and PLP rather than using each feature separately and this mainly for noisy data.

Full Text