A CNN-BiLSTM based hybrid model for Indian language identification

Himanish Shekhar Das,Pinki Roy

doi:10.1016/j.apacoust.2021.108274

Abstract

Automatic language identification (LID) is the practice of detecting the language through linguistic content of speech spoken by certain anonymous speaker. In multi-lingual based Indian society, the capability to identify and classify a spoken language is an imperative task. In this work, convolutional neural network (CNN) based bidirectional long short-term memory (BiLSTM) model has been proposed for Indian language identification with special emphasis on Northeastern languages. Initially, speech samples are transformed into spectrogram images, which comprises the multi-frequency band signals with respect to time. Deep linguistic features for the said task have been extracted from the spectrogram images using CNNs. Furthermore, BiLSTM network is used to extract more temporal information for language identification. On top of it, attention mechanism has been used to find the context for each frame. At the end, softmax function is applied to calculate language score for each language. The simulation is conducted on recorded database consists of four Indian languages. The languages are Assamese, Bengali, Indian English, and Hindi respectively. The performance of proposed LID model is tested on two different CNN architectures ResNet-50 and VGG-16. Simulation results show that the ResNet-50 based model has achieved accuracy up to 98.10% as compare to 97.70% for VGG-16 based model.

Full Text