Klasifikasi Dialek Pengujar Bahasa Inggris Menggunakan Random Forest

Muhamad Azhar,Hilman Ferdinandus Pardede

doi:10.30865/mib.v5i2.2754

Muhamad Azhar, Hilman Ferdinandus Pardede

Open Access

https://doi.org/10.30865/mib.v5i2.2754

Copy DOI

Abstract

Speech recognition is one of the important research fields which is currently widely used for various applications. However, speech recognition performance is affected by the dialect of the speaker. Therefore, dialect recognition is often used as an additional feature in speech recognition. The process of recognizing dialects is not easy. Currently, Machine Learning technology is widely applied in dialect recognition. One of the challenges in the introduction of machine learning-based dialects is the imbalance of classes and overlaps in a wide variety of classification techniques. This study applies Random Forest-based oversampling technology for dialect recognition. For hyper-parameter optimization of the random forest algorithm, we apply the Grid Search method. Experiments on Speech Accent Archive data using the MFCC feature resulted in an accuracy of 0.91 and AUC of 0.95

Highlights

Pengenalan suara merupakan salah satu bidang riset atau penelitian yang cukup penting dimana saat ini sudah banyak digunakan secara luas untuk keperluan berbagai aplikasi [1]
of the important research fields which is currently widely used for various applications
speech recognition performance is affected by the dialect of the speaker

Summary

PENDAHULUAN

Pengenalan suara merupakan salah satu bidang riset atau penelitian yang cukup penting dimana saat ini sudah banyak digunakan secara luas untuk keperluan berbagai aplikasi [1]. Dalam proses pengenalan suara dibutuhkan suatu metode untuk ekstraksi fitur dari suara tersebut, hasil dari ekstraksi tersebut yang nantinya akan diproses dan dilakukan pencocokan dengan pemodelan tertentu. Pendekatan level data (Sampling) dapat digunakan untuk modifikasi distribusi kelas dari data latih untuk menyeimbangkan data [13], pendekatan level data itu sendiri adalah tahapan preprocessing yang dilakukan sebelum membuat pemodelan machine learning [12]. Kami mengusulkan Random Forest (RF) untuk sistem pengenalan dialek dengan menggunakan Random Over Sampling (ROS) dan SMOTE untuk mengatasi ketidakseimbangan data. Penelitian ini menggunakan dataset speech accent yang dapat diunduh pada repository http://accent.gmu.edu/ (Diakses pada 13 Oktober 2020)

METODOLOGI PENELITIAN

Resampling

Random Forest dan Grid Search

Dataset

Setting Grid Search

Data Audio Hasil Ekstraksi Fitur MFCC

Hasil resampling dengan ROS ataupun SMOTE

Pengujian Model

KESIMPULAN

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Klasifikasi Dialek Pengujar Bahasa Inggris Menggunakan Random Forest

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Jurnal media informatika Budidarma

Lead the way for us

Journal: Jurnal media informatika Budidarma	Publication Date: Apr 25, 2021
License type: CC BY 4.0

Similar Papers

Combined speech enhancement and auditory modelling for robust distributed speech recognition
Ronan Flynn ... Edward Jones
Speech Communication | VOL. 50
Ronan Flynn, et. al.Ronan Flynn ... Edward Jones
20 May 2008
Speech Communication | VOL. 50

Improving the performance of MFCC for Persian robust speech recognition
...
Journal of AI and Data Mining | VOL. 3
, et. al. ...
01 Jan 2015
Journal of AI and Data Mining | VOL. 3

USING NONLINEAR MODELING OF RECONSTRUCTED PHASE SPACE AND FREQUENCY DOMAIN ANALYSIS TO IMPROVE AUTOMATIC SPEECH RECOGNITION PERFORMANCE
Ayyoob Jafari ... Farshad Almasganj
International Journal of Bifurcation and Chaos [In Applied Sciences and Engineering] | VOL. 22
Ayyoob Jafari, et. al.Ayyoob Jafari ... Farshad Almasganj
01 Mar 2012
International Journal of Bifurcation and Chaos [In Applied Sciences and Engineering] | VOL. 22

Medium-duration modulation cepstral feature for robust speech recognition
Vikramjit Mitra ... Horacio Franco
-
Vikramjit Mitra, et. al.Vikramjit Mitra ... Horacio Franco
01 May 2014
01 May 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Klasifikasi Dialek Pengujar Bahasa Inggris Menggunakan Random Forest

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Jurnal media informatika Budidarma