Enhanced multi-ethnic speech recognition using pitch shifting generative adversarial networks

Kristiawan Nugroho,Felix Sutanto,Dhendra Marutho,Kristophorus Hadiono,Omar Farooq

doi:10.11591/ijai.v13.i3.pp2904-2911

Kristiawan Nugroho, Felix Sutanto + Show 3 more

https://doi.org/10.11591/ijai.v13.i3.pp2904-2911

Copy DOI

Abstract

<p><span lang="EN-US">Research in the field of speech recognition is a challenging research area. Various approaches have been applied to build robust models. A problem faced in speech recognition research is overfitting, especially if there is insufficient data to train the model. A large enough amount of data can train the model well, resulting in high accuracy. Data augmentation is an approach often used to increase the quantity of dataset. This research uses a data augmentation approach, namely pitch shifting, to increase the quantity of speech dataset, which is then processed into spectrogram data and then classified using a generative adversarial network (GAN). Using the pitch shifting-generative adversarial network (PS-GAN) model, this research produces high accuracy performance in multi-ethnic speech recognition, namely 98.43%, better than several similar studies.</span></p>

Full Text