Abstract

Speaker's identification systems aim to identify, through a set of speech parameters, the speaker's identity. Thus, a relevant speech representation is required. For this purpose, we suggest to combine spectral parameters as the Mel frequency Cepstral coefficients (MFCC) and the perceptual linear predictive (PLP) coefficients and prosodic parameter such as the signal fundamental frequency (F0). There are two main classes for F0 estimation divided into temporal and spectral methods. We employ the sawtooth waveform inspired pitch estimator (SWIPE) algorithm for F0 estimation. It is based on the pitch estimation in the frequency domain. In addition, we evaluate the Gaussian mixture model-universal background model (GMM-UBM) for the modelling purpose. Experiments are involved in Timit database. Identification rates are promising and prove the benefit of the combination for MFCC and PLP rather than using each feature separately and this mainly for noisy data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call