Effective preprocessing of speech and acoustic features extraction for spoken language identification

Abhijeet Kumar,N Sakthivel,H Hemani,S Chaturvedi

doi:10.1109/icstm.2015.7225394

Abstract

Language identification (LID) systems have become very popular and indispensible in multilingual speech processing applications where there is need of preprocessing of machine systems and preprocessing of human interface. The system predicts the best identified language given the speech utterance. The proposed LID system uses a gaussian mixture model (GMM) based LID which uses generatively trained language models on acoustic features of a particular language. Acoustic approach requires only the digitized speech utterance and their language labels which are less expensive computationally than the alternative approaches which also require phonetic transcription of speech. This paper investigates the different preprocessing techniques for noise removal, speech activity detection (SAD), speaker normalization and channel normalization. Also, the extraction procedure of cepstral features that captures the phonetic characteristics of signal is illustrated. We also give a comprehensive review of the current trends in feature extraction and compare the results of the same. Notably, Shifted delta cepstral (SDC), a quintessential feature for LID systems derived from Mel frequency cepstral features (MFCC) have been successfully tested with GMM based classifier. A comparative study between use of MFCC and SDC features in LID has been conducted and presented.

Full Text