Abstract

Spoken Language Identification (SLID) aims at assigning language labels to speech in an audio file. This paper proposes an approach based on Convolution Neural Networks (CNN) for the automatic identification of four Indian languages, Bengali, Gujarati, Tamil and Telugu. The classifier is trained on audio data of 5 hours duration, from each of the four languages. The CNN operates on MFCC spectrogram images generated from short splits of two to four second duration from the raw audio input with varying audio quality and noise print. The paper also analyzes the SLID system performance as a function of different train and test audio sample durations. The proposed CNN model achieves 88.82% accuracy, which can be considered as best when compared with machine learning models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.