Abstract

In this letter, we propose a concise feature representation framework for acoustic scene classification by pruning embeddings obtained from SoundNet, a deep convolutional neural network. We demonstrate that the feature maps generated at various layers of SoundNet have redundancy. The proposed singular value decomposition based method reduces the redundancy while relying on the assumption that useful feature maps produced by different classes lie along different directions. This leads to ignoring the feature maps that produce similar embeddings for different classes. In the context of using an ensemble of classifiers on the various layers of SoundNet, pruning the redundant feature maps leads to reduction in dimensionality and computational complexity. Our experiments on acoustic scene classification demonstrate that ignoring 73% of feature maps reduces the performance by less than 1% with 12.67% reduction in computational complexity. In addition to this, we also show that the proposed pruning framework can be utilized to remove filters in the SoundNet network architecture, with 13x lesser model storage requirement. Also, the number of parameters reduce from 28 million to 2 million with marginal degradation in performance. This small model obtained after applying the proposed pruning procedure is evaluated on different acoustic scene classification datasets, and shows excellent generalization ability.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.