Abstract

Attention mechanism (AM) is a widely used method for improving the performance of convolutional neural networks (CNNs) on computer vision tasks. Despite its pervasiveness, we have a poor understanding of what its effectiveness stems from. It is popularly believed that its effectiveness stems from the visual attention explanation, i.e., attention weights indicate the importance of feature and AM advocates focusing on the important part of an input image rather than ingesting the entire input. However, we find only a weak consistency exists between the attention weights of features and their importance. We verify the feature map multiplication that brings about high-order non-linearity into CNNs is crucial for the effectiveness of AM. Furthermore, we show an essential impact of feature map multiplication on the learned surfaces of CNNs. With the high-order non-linearity, feature map multiplication plays a regularization role on CNNs, which makes the learned curves smoother and more stable in-between real samples (test/training samples in datasets). Thus, compared to vanilla CNNs, CNNs equipped with AM are more robust to noises and yield smaller model sensitivity scores, which is the reason for their better performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.