Abstract
Spoken language identification is a field of research that is already being done by many people. There are many techniques proposed for doing speech processing, such as Support Vector Machines, Gaussian Mixture Models, Decision Trees, and others. This paper will use the system using the Mel-Frequency Cepstral Coefficient (MFCC) features of speech input signal, use Random Forest (RF), Gaussian Mixture Model (GMM), and K-Nearest Neighbor (KNN) as a classifier, use the 3s, 10s, and 30s as scoring method, and use dataset that consists of Javanese, Sundanese, and Minang languages which are traditional languages from Indonesia. K-Nearest Neighbor has 98.88% of accuracy for 30s of speech and followed by Random Forest that has 95.55% of accuracy for 30s of speech, GMM has 82.24% of accuracy.
Highlights
Indonesia is an archipelago in the Southeast Asia region
Random Forest, K-Nearest Neighbor (KNN) and Gaussian Mixture Model (GMM) will be used for classification techniques, the accuracy will be obtained from segmented speech in 3 seconds, 10 seconds, and 30 seconds
This paper will use MelFrequency Cepstral Coefficient (MFCC) and Random Forest to see the effect of the total trees on the accuracy and expanding the testing method by segment the duration of the test from frame to 3s speech, 10s speech, 30s speech and briefly discusses the performance of computation time when conducting model training using three traditional language from Indonesia
Summary
Indonesia is an archipelago in the Southeast Asia region. Indonesia consists of large islands and small islands spread from Sabang to Merauke, so that the Indonesian State is dubbed the Archipelago State. Some of regional languages in Indonesia are extinct because the language is not widely used in the regions anymore To prevent it from extinction, by collecting the dataset of regional languages to be studied, it can help to prevent extinction of regional languages, because when building a classification technique, a large scale of dataset is needed and by developing the SLI, the application can be used as a leading component of applications such as translators used to classify regional languages, which later can be used in speech-based information systems, speech-based translate, and others. Random Forest, KNN and GMM will be used for classification techniques, the accuracy will be obtained from segmented speech in 3 seconds, 10 seconds, and 30 seconds. This study will examine spoken language identification using the techniques mentioned above and using the GMM technique which is often used in spoken language identification which is segmented at 3 seconds, 10 seconds, and 30 seconds as the baseline
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have