Abstract
Pooling layers appear widely in deep networks for its aggregating information in a local region and fast downsampling. Due to the reason that the closer to the output layer, the more the network learns is the high-level semantic information related to classification, the global average pooling would inhibit the contribution of local high magnitudes features in the global region. Besides, the gradient of the distinctive features is considerably attenuated due to the large region size of global average pooling. In this paper, we propose a global learnable pooling operation to enhance the distinctive high-level features in the global region, which is codenamed as GLPool. Because it is located before the classification layer, our GLPool is more sensitive to network performance. Besides, GLPool is not a hand-crafted pooling operation, which has the characteristic of adapting to any size of the input. With few parameters is added, GLPool is also a plug-and-play layer. The visualization via class activation map (CAM) on GoogLeNet and ShuffleNet-v2 also shows that GLPool can learn more concentrated and high-level distinctive features than global average pooling. The experiments on several classical deep models demonstrate the significant performance improvements on ImageNet32 and CIFAR100 datasets, which is exceeding obvious for lightweight networks.
Highlights
Convolution and pooling operations performed a very significant role in deep learning
We propose a global learnable pooling operation to enhance the learning of distinctive features in aggregated global information
Since our global learnable pooling (GLPool) operates on a feature map, class activation map (CAM) can be used to display the active region of GLPool
Summary
Convolution and pooling operations performed a very significant role in deep learning. Since pooling operations are parameterless and consume almost no computing resources, they are widely used to reduce the size of feature maps in the network. Inherited from the design concept of LeNet [1], both AlexNet [2] and VGG [3] apply convolution and pooling as the main components of the network. The pooling operations aggregate information within a local region to keep the useful features, it is applied to extract features in GoogleNet [4]–[6], which would considerably increase feature richness. To solve the problem of full connection in convolution networks, NIN (Network in Network) [7] proposes a global average pooling (GAP) operation, which effectively avoids over-fitting and is more in line with the calculation method of CNN.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.