Abstract

Sound-event recognition often utilizes time-frequency analysis to produce an image-like spectrogram that provides a rich visual representation of original signal in time and frequency. Convolutional Neural Networks (CNN) with the ability of learning discriminative spectrogram patterns are suitable for sound-event recognition. However, there is relatively little effort that CNN makes full use of the important temporal information. In this paper, we propose MCRNN, a Convolutional Recurrent Neural Networks (CRNN) architecture for sound-event recognition, the letter “M” in the name “MCRNN” of our model denotes the multi-sized convolution filters. Richer features are extracted by using several different convolution filter sizes at the last convolution layer. In addition, cochleagram images are used as the input layer of the network, instead of the traditional spectrogram image of a sound signal. Experiments on the RWCP dataset shows that the recognition rate of the proposed method achieved 98.4% in clean conditions, and it robustly outperforms the existing methods, the recognition rate increased by 0.9%, 1.9% and 10.3% in 20 dB, 10 dB and 0 dB signal-to-noise ratios (SNR), respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.