Abstract

Model compression and acceleration are attracting increasing attentions due to the demand for embedded devices and mobile applications. Research on efficient convolutional neural networks (CNNs) aims at removing feature redundancy by decomposing or optimizing the convolutional calculation. In this work, feature redundancy is assumed to exist among channels in CNN architectures, which provides some leeway to boost calculation efficiency. Aiming at channel compression, a novel convolutional construction named compact convolution is proposed to embrace the progress in spatial convolution, channel grouping and pooling operation. Specifically, the depth-wise separable convolution and the point-wise interchannel operation are utilized to efficiently extract features. Different from the existing channel compression method which usually introduces considerable learnable weights, the proposed compact convolution can reduce feature redundancy with no extra parameters. With the point-wise interchannel operation, compact convolutions implicitly squeeze the channel dimension of feature maps. To explore the rules on reducing channel redundancy in neural networks, the comparison is made among different point-wise interchannel operations. Moreover, compact convolutions are extended to tackle with multiple tasks, such as acoustic scene classification, sound event detection and image classification. The extensive experiments demonstrate that our compact convolution not only exhibits high effectiveness in several multimedia tasks, but also can be efficiently implemented by benefiting from parallel computation.

Highlights

  • Convolutional neural networks (CNNs) are attracting considerable attention in an increasing array of area, such as computer vision [1]–[3], computational acoustics [4]–[6] and natural language processing [7]–[9]

  • We found that feature redundancy exists among channels in CNN architecture, i.e., amounts of interchannel information is unimportant or even unnecessary in some cases

  • Rather than a better function approximator, this paper focuses on the efficient approaches for reducing the interchannel redundancy, and compressing the dimension of feature maps in a larger range

Read more

Summary

Introduction

Convolutional neural networks (CNNs) are attracting considerable attention in an increasing array of area, such as computer vision [1]–[3], computational acoustics [4]–[6] and natural language processing [7]–[9]. The general trend is to design deeper and more complicated network architecture to pursue better performance. Massive resources are required for desired performance, which hinders CNN-based classifiers from the real-time inference in mobile applications. Over the past few decades, various methods have been exploited for model compression and acceleration, including pruning [10]–[13], weight sharing [14], [15], low-rank matrix factorization [16]–[18] and knowledge distillation [19]–[21]. The associate editor coordinating the review of this manuscript and approving it for publication was Seok-Bum Ko. The associate editor coordinating the review of this manuscript and approving it for publication was Seok-Bum Ko Despite their desirable compression abilities, most of the compression methods typically suffer from two major drawbacks. Various manually chosen parameters (and even a lot of empirical engineering that only experts are competent to deal with) are required in these methods

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.