Abstract

In recent years, convolutional neural networks (CNNs) have gained remarkable success in various image and audio categorization and detection tasks. Using dilated convolution in CNN can extend the network's receptive field and improve its performance, and it can also achieve a lightweight model by compressing CNN. Environmental Sound Classification (ESC) is an essential aspect of the non-speech audio classification problem and has attracted the scientific community's interest due to the breakthrough in learning algorithms ESC. In this study, we propose a novel dilation-based CNN model from scratch and a baseline classic CNN model under the same condition and parameters. The dilation-based approach is very flexible and can be used for several standard CNNs to improve model performances. In addition, this strategy has a high training efficiency. The experimental results obtained on the publicly accessible Urbansounds8K dataset indicate that the proposed technique outperforms the classic CNN methods. Dilation-based network enhances accuracy and F1 values by around 4-9% compared to the baseline method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.