Abstract

A Language Identification (LID) System finds out the language of a given speech utterance. Languages can be divided into tonal and non-tonal categories based on whether the meaning of the same word will change or not with the change in pitch variation. Classifying languages into tonal and non-tonal categories before the individual language identification stage will reduce the complexity of the LID system. Though state of the art systems use prosodic features for this purpose, this work is focused on analysing the performance of spectral features for tonal and non-tonal classification of languages. Performance analysis of different spectral feature combinations namely, Mel Frequency Cepstral Coefficients (MFCC), MFCC along with Shifted Delta Cepstral (SDC) Coefficients, Mean Hilbert Envelope Coefficients (mHeC) and MHEC along with SDC Coefficients is carried out in this study. Experiments have been performed on Oregon Graduate Institute-Multilingual Telephone Speech Corpus (OGI-MLTS) and NITS Language database using GMM-UBM modelling technique. Results show that MHEC with SDC and MFCC with SDC features, at syllabic level, give comparable performance of 33.97% Equal Error Rate (EER) for this classification task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call