SPEECH TO TEXT BAHASA SASAK MENGGUNAKAN EXTRAKSI FITUR MEL-FREQUENCY CEPSTRAL COEFFICIENTS DAN KLASIFIKASI CONVOLUTIONAL NEURAL NETWORKS

Arik Aranta,Belmiro Razak Setiawan,Budi Irmawati

doi:10.29303/jtika.v5i1.235

Abstract

Artificial intelligence technology allows digital signals to be processed by computers. Currently speech to text is available only in Indonesian and English versions. Speech to text is a system that performs commands from human voice input and is then translated into words. The development of speech to text in regional languages is needed because it can be a bridge between culture and technological progress. From 5 research literature, it was found that the mel-frequency cepstral coefficients (MFCC) and convolutional neural networks (CNN) methods are a combination of the commonly used voice signal analysis methods and get accuracy between 70.00% to 99.00%. This study uses the CNN and MFCC methods in the speech to text field to recognize the Sasak language and convert it into text. The result of this research is a real time conversion system from voice to text in Sasak language. The analysis carried out includes determining the best amount of training data, testing the training data on the number of votes based on accuracy, the sensitivity of the algorithm to words that have similar prefixes using the MFCC method as feature extraction and CNN as a classifier for the voice dataset. This study aims to obtain the accuracy of the dataset used and the sensitivity of the algorithm to sentences that have similarities. In this study got 2 results. The first result is the result of training with the accuracy of CNN training is 90% and loss is 0.5%. The second result is the result of an experiment using 3 voice samples for each word on the dateset with 43 correct words, 6 correct words 2, 1 correct word 1 and none of the words incorrect. So it has a percentage of 86% all correct, 12% correct 2, and 2% correct 1, and 0% all wrong.

Full Text