Abstract

ABSTRACTAn ability to categorise and recognise a spoken language is an essential task in a multi-lingual society like India. Language identification (LID) is the process of identifying the language spoken by some unknown speaker using a given speech sample. In this article, textural descriptors extracted from spectrogram image and evolutionary feature selection is presented for Indian language identification. Language-specific long-term cues and prosodic information present in various frequency zones of the spectrogram image can efficiently modelled using textural descriptors. Firstly, an input audio sample is converted into a spectrogram visual representation which characterises the band of frequencies of a signal with respect to time. Then, completed local binary pattern (CLBP), local binary pattern histogram Fourier (LBPHF) and discrete Wavelet transform based texture descriptors are used to extract the features from the spectrogram image. Later, using grey wolf optimiser (GWO) feature selection, irrelevant and redundant features are removed, and only optimal features are selected from the dataset. GWO-based feature selection supports to construct the classification model with optimal features and the performance of the classifier is optimised. Finally, using the artificial neural network classifier and Indic-TTS database 96.9659% accuracy was obtained.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call