Indian language identification using time-frequency image textural descriptors and GWO-based feature selection

Amit A Chowdhury,Vaibhav S Borkar,Gajanan K Birajdar

doi:10.1080/0952813x.2019.1631392

Abstract

ABSTRACTAn ability to categorise and recognise a spoken language is an essential task in a multi-lingual society like India. Language identification (LID) is the process of identifying the language spoken by some unknown speaker using a given speech sample. In this article, textural descriptors extracted from spectrogram image and evolutionary feature selection is presented for Indian language identification. Language-specific long-term cues and prosodic information present in various frequency zones of the spectrogram image can efficiently modelled using textural descriptors. Firstly, an input audio sample is converted into a spectrogram visual representation which characterises the band of frequencies of a signal with respect to time. Then, completed local binary pattern (CLBP), local binary pattern histogram Fourier (LBPHF) and discrete Wavelet transform based texture descriptors are used to extract the features from the spectrogram image. Later, using grey wolf optimiser (GWO) feature selection, irrelevant and redundant features are removed, and only optimal features are selected from the dataset. GWO-based feature selection supports to construct the classification model with optimal features and the performance of the classifier is optimised. Finally, using the artificial neural network classifier and Indic-TTS database 96.9659% accuracy was obtained.

Full Text