Abstract

A Language Identification (LID) System finds out the language of a given speech utterance. Languages can be divided into tonal and non-tonal categories based on whether the meaning of the same word will change or not with the change in pitch variation. Classifying languages into tonal and non-tonal categories before the individual language identification stage will reduce the complexity of the LID system. Though state of the art systems use prosodic features for this purpose, this work is focused on analysing the performance of spectral features for tonal and non-tonal classification of languages. Performance analysis of different spectral feature combinations namely, Mel Frequency Cepstral Coefficients (MFCC), MFCC along with Shifted Delta Cepstral (SDC) Coefficients, Mean Hilbert Envelope Coefficients (mHeC) and MHEC along with SDC Coefficients is carried out in this study. Experiments have been performed on Oregon Graduate Institute-Multilingual Telephone Speech Corpus (OGI-MLTS) and NITS Language database using GMM-UBM modelling technique. Results show that MHEC with SDC and MFCC with SDC features, at syllabic level, give comparable performance of 33.97% Equal Error Rate (EER) for this classification task.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.