<p><span lang="EN-US">Research in the field of speech recognition is a challenging research area. Various approaches have been applied to build robust models. A problem faced in speech recognition research is overfitting, especially if there is insufficient data to train the model. A large enough amount of data can train the model well, resulting in high accuracy. Data augmentation is an approach often used to increase the quantity of dataset. This research uses a data augmentation approach, namely pitch shifting, to increase the quantity of speech dataset, which is then processed into spectrogram data and then classified using a generative adversarial network (GAN). Using the pitch shifting-generative adversarial network (PS-GAN) model, this research produces high accuracy performance in multi-ethnic speech recognition, namely 98.43%, better than several similar studies.</span></p>