Abstract
Environmental Sound Classification (ESC) plays a vital role in machine auditory scene perception. Deep learning based ESC methods., such as the Dilated Convolutional Neural Network (D-CNN)., have achieved the state-of-art results on public datasets. However., the D-CNN ESC model size is often larger than 100MB and is only suitable for the systems with powerful GPUs., which prevents their applications in handheld devices. In this study., we take the D-CNN ESC framework and focus on reducing the model size while maintaining the ESC performance. As a result., a lightweight D-CNN (termed as LD-CNN) ESC system is developed. Our work lies on twofold. First., we propose into reduce the number of parameters in the convolution layers by factorizing a two-dimensional convolution filters $(L \times W)$ to two separable one-dimensional convolution filters ( $L \times 1$ and $1\times W$ ). Second., we propose to replace the first fully connection layer (FCL) by a Feature Sum layer (FSL) to further reduce the number of parameters. This is motivated by our finding that the features of the environmental sounds have weak absolute locality property and a global sum operation can be applied to compress the feature map. Experiments on three public datasets (ESC50., UrbanSound8K., and CICESE) show that the proposed system offers comparable classification performance but with a much smaller model size. For example., the model size of our proposed system is about 2.05MB., which is 50 times smaller than the original D-CNN model., but at a loss of only 1%-2 % classification accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.