Abstract

There are two major questions regarding Environmental Sound Classification (ESC). What is the best audio recognition framework, and what is the most robust audio feature? For investigating above problems, the Gated Recurrent Unit (GRU) network was used to analyze the effect of single features such as Mel Scale Spectrogram (Mel), Log-Mel Scale Spectrogram (LM), and Mel frequency cepstral coefficient (MFCC) as well as multi-feature about Mel-MFCC, LM-MFCC, and Mel-LM-MFCC (T-M) in this paper. The experiment results show that in the ESC tasks, multi-features are better than the single features in the same dimensions, and LM-MFCC has the strongest robustness. Meanwhile, reverse sequence MFCC (R-MFCC) and forward and reverse mixed sequence MFCC (FR-MFCC) were also proposed to study the effects of sequence changes on audio. The experiment results show that the sequence transformation of audio features has less influence on the recognition tasks. Furthermore, to investigate the ESC task further we introduced the attention weight similar model (AWS) in to the multi-feature. The AWS model allows different audio feature attention weights of the same sound to learn from each other. It makes the GRU-AWS model focus on the frame-level features more effectively. The experiment results show that the GRU-AWS gets excellent performance with a recognition rate of 94.3%, and it outperforms other state-of-the-art methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.