Audio Scene Classification Based on Gated Recurrent Unit

Lidong Yang,Zhuangzhuang Zhang,Jiangtao Hu

doi:10.1109/icsidp47821.2019.9173051

Abstract

Audio Scene Classification (ASC) is a very important part of acoustic scene understanding. Recurrent Neural Network (RNN) models are widely adopted with proven successes in ASC. In this paper, the Gated Recurrent Unit (GRU) is introduced for ASC. The Mel-Frequency Cepstral Coefficients (MFCCs) which are closer to the auditory characteristics of the human ear are extracted as feature vectors and inputs to the classifier. The GRU is a slightly more simplified variation of the Long Short-Term Memory Unit (LSTM), and it is simpler than the standard LSTM. The Gated Recurrent Units (GRUs) are used to build a network model to implement ASC while controlling the number of parameters with respect to their performance. In these experiments, good results were achieved on the UrbanSound8Kdataset with an average accuracy of 83.8%, which has significant advantages over traditional network models.

Full Text