Abstract
This paper motivates the use of combination of mel frequency cepstral coefficients (MFCC) and its delta derivatives (DMFCC and DDMFCC) calculated using mel spaced Gaussian filter banks for text independent speaker recognition. MFCC modeled on the human auditory system shows robustness against noise and session changes and hence has become synonymous with speaker recognition. Our main aim is to test the accuracy of our proposed feature set for different values of frame overlap and MFCC feature vector sizes to identify the system having highest accuracy. Principal component analysis (PCA) is applied before the training and testing stages for feature dimensionality reduction thereby increasing computing speed and puts low constraint on the memory required for processing. The use of probabilistic neural network (PNN) in the modeling domain provided the advantages of achieving lower operational times during the training stages. The experiments examined the percentage identification accuracy (PIA) of MFCC, combination of MFCC and DMFCC as well as combination of all three feature sets MFCC, DMFCC and DDMFCC. The proposed feature set attains an identification accuracy of 94% for frame overlap of 90% and MFCC feature size of 18 coefficients. It outperforms the identification rates of the other two feature sets. These speaker recognition experiments were tested using the Voxforge database.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have