Abstract

Spoken language identification is a field of research that is already being done by many people. There are many techniques proposed for doing speech processing, such as Support Vector Machines, Gaussian Mixture Models, Decision Trees, and others. This paper will use the system using the Mel-Frequency Cepstral Coefficient (MFCC) features of speech input signal, use Random Forest (RF), Gaussian Mixture Model (GMM), and K-Nearest Neighbor (KNN) as a classifier, use the 3s, 10s, and 30s as scoring method, and use dataset that consists of Javanese, Sundanese, and Minang languages which are traditional languages from Indonesia. K-Nearest Neighbor has 98.88% of accuracy for 30s of speech and followed by Random Forest that has 95.55% of accuracy for 30s of speech, GMM has 82.24% of accuracy.

Highlights

  • Indonesia is an archipelago in the Southeast Asia region

  • Random Forest, K-Nearest Neighbor (KNN) and Gaussian Mixture Model (GMM) will be used for classification techniques, the accuracy will be obtained from segmented speech in 3 seconds, 10 seconds, and 30 seconds

  • This paper will use MelFrequency Cepstral Coefficient (MFCC) and Random Forest to see the effect of the total trees on the accuracy and expanding the testing method by segment the duration of the test from frame to 3s speech, 10s speech, 30s speech and briefly discusses the performance of computation time when conducting model training using three traditional language from Indonesia

Read more

Summary

INTRODUCTION

Indonesia is an archipelago in the Southeast Asia region. Indonesia consists of large islands and small islands spread from Sabang to Merauke, so that the Indonesian State is dubbed the Archipelago State. Some of regional languages in Indonesia are extinct because the language is not widely used in the regions anymore To prevent it from extinction, by collecting the dataset of regional languages to be studied, it can help to prevent extinction of regional languages, because when building a classification technique, a large scale of dataset is needed and by developing the SLI, the application can be used as a leading component of applications such as translators used to classify regional languages, which later can be used in speech-based information systems, speech-based translate, and others. Random Forest, KNN and GMM will be used for classification techniques, the accuracy will be obtained from segmented speech in 3 seconds, 10 seconds, and 30 seconds. This study will examine spoken language identification using the techniques mentioned above and using the GMM technique which is often used in spoken language identification which is segmented at 3 seconds, 10 seconds, and 30 seconds as the baseline

LITERATURE REVIEW
Dataset
Pre-Processing
Model Development
Evaluation
AND DISCUSSION
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call