Abstract
Language Identification (LID) is accurate identification of the unknown language by comparison of speech biometrics of test speech sample and language models accumulated beforehand. This paper presents and encourages the use of hybrid robust feature extraction techniques for spoken language identification (LID) system. In the feature extraction stage, different techniques are applied individually such as Mel frequency cepstral coefficients (MFCCs), perceptual linear prediction features (PLP), relative perceptual linear prediction features (RASTA-PLP). Later, performance of our LID system based on several combinations of the different features (hybrid features) are investigated such as MFCC, PLP, combined with their 1st order derivatives, MFCC + RASTA-PLP, MFCC + SDC (Shifted delta cepstral coefficients). Language identification phase or classification utilizes feed forward back-propagation neural network (FFBPNN) and comparison is based on two learning algorithms: the Levenberg–Marquardt “trainlm” and the scaled conjugate gradient “trainscg”. A comparative analysis in terms of performance is done between different hybrid feature extraction techniques and their individual counterparts. Results clearly indicates that improved performance is obtained with hybrid features with “trainlm” learning algorithm as compared to their individual counterparts. The results are very promising with MFCC-RASTA-PLP hybrid feature extraction technique in comparison to the other hybrid feature extraction techniques with overall accuracy of 94.6% and a minimum test error rate of 0.10. The efficiency of proposed hybrid approaches is determined by simulating several experiments on a user defined language database of speech signals in the working platform of MATLAB.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have