Abstract

Image classification has become highly significant in the field of computer vision due to its wide array of applications. In recent years, Convolutional Neural Networks (CNN) have emerged as potent tools for addressing this task. Attention mechanisms offer an effective approach to enhance the accuracy of image classification. Despite Global Average Pooling (GAP) being a crucial component of traditional attention mechanisms, it only computes the average of spatial elements in each channel, failing to capture the complete range of feature information, resulting in fewer and less expressive features. To address this limitation, we propose a novel pooling operation named “Binary Pooling” and integrate it into the attention block. Binary pooling combines both GAP and Global Max Pooling (GMP), obtaining a more comprehensive feature vector by extracting average and maximum values, thereby enriching the diversity of extracted image features. Furthermore, to further enhance the extraction of image features, dilation operations and pointwise convolutions are applied on the channel-wise. The proposed attention block is simple yet highly effective. Upon integration into ResNet18/50 models, it leads to accuracy improvements of 2.02%/0.63% on ImageNet.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.