Abstract

This paper presents a two-stage Indian language identification (TS-LID) system which is made up of a tonal/non-tonal pre-classification and individual language identification modules. It studies the effectiveness of Mean Hilbert envelope coefficients (MHEC) and Mel-frequency cepstral coefficients (MFCCs), and their combinations with prosody in TS-LID context. Both glottal closure instants (GCIs)-based approaches and the block processing (BP) approach have been explored. It also explores different types of analysis units, such as whole utterance and syllable. Various state-of-art modeling techniques have been analyzed in this work. Experiments have been carried out for the NIT Silchar language database (NITS-LD) and OGI-Multilingual database (OGI-MLTS). The results suggest that at the pre-classification stage, for NITS-LD, the deep neural network (DNN) with syllable-level features, using GCI-based approaches, provides the highest accuracies of 90.6%, 85% and 81.3% for 30 s, 10 s and 3 s test data respectively. The GCI-based approaches outperform the BP method by as much as 7.5%, 6.2%, and 5.7%. The pre-classification module helps to improve the performance of the LID system by as much as 5.7%, 4.4% and 2.2% for 30 s, 10 s and 3 s test data respectively. The corresponding improvements for OGI-MLTS database are 7.4%, 6.8%, and 5%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call