Abstract

Currently, there have been many studies on speech recognition or speech to text. Speech to text is a technology used to convert human speech or voice and translate it into written text. Some speech to text research that has been done, has obtained an accuracy rate of up to 95% with English datasets using the Mel Frequency Coefficient (MFCC) feature extraction method and the Convolutional Neural Network (CNN) classification method. This research will apply similar algorithms, namely MFCC and CNN by displaying the training process and the resulting accuracy in its processing with an analysis scenario using datasets in multiples of 50, 150, 250, and 350 voice data. The results obtained have achieved 95% accuracy on the training data of 350 English voice data. The analysis carried out is to find the best composition on the Sasak language dataset by comparing the accuracy of the test results with the accuracy of the previous training results on the English dataset. From the training and testing process that has been carried out, the results obtained show that the best dataset composition for Sasak language is with nine speakers. This illustrates that the Sasak language requires less human resources compared to the English dataset which involves more than 30 speakers in 50 words. This has a positive impact on saving resources and time required in the development of Sasak language speech recognition system.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.