Abstract

Labeling speech in an audio file with appropriate dialect labels is the aim of a dialect identification system. This paper presents a method of using convolution neural networks (CNN) to identify four Assamese dialects: Goalporia dialect, Kamrupi dialect, Eastern Assamese dialect, and Central Assamese dialect. This study employed the speech patterns of four major Assamese regional dialects: the Central Dialects spoken in and around the district of Nagaon; the Eastern Assamese dialect spoken in the districts of Sibsagar and its neighboring areas; the Kamrupi dialect spoken in the districts of Kamrup, Nalbari, Barpeta, Kokarajhar, and some areas of Bongaigaon; and the Goaplari dialect spoken in the Goaplara, Dhuburi, and a portion of Bongaigaon district. Over the course of two hours, audio samples from each of the four dialects were used to train the classifier. Mel spectrogram pictures, which are produced from two to four second divisions of raw audio input with varying audio quality, are used by the CNN. The system's performance is also analyzed in relation to the lengths of the train and test audio samples. The proposed CNN model achieves an accuracy of 90.82 percent, which may be the best when compared to machine learning models.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call