Abstract

Utilizing channel-wise spatial attention mechanisms to emphasize special parts of an input image is an effective method to improve the performance of convolutional neural networks (CNNs). There are multiple effective implementations of attention mechanism. One is adding squeeze-and-excitation (SE) blocks to the CNN structure that selectively emphasize the most informative channels and suppress the relatively less informative channels by taking advantage of channel dependence. Another method is adding convolutional block attention module (CBAM) to implement both channel-wise and spatial attention mechanisms to select important pixels of the feature maps while emphasizing informative channels. In this paper, we propose an encoder-decoder architecture based on the idea of letting the channel-wise and spatial attention blocks share the same latent space representation. Instead of separating the channel-wise and spatial attention modules into two independent parts in CBAM, we combine them into one encoder-decoder architecture with two outputs. To evaluate the performance of the proposed algorithm, we apply it to different CNN architectures and test it on image classification and semantic segmentation. Through comparing the resulting structure equipped with MEDA blocks against other attention module, we show that the proposed method achieves better performance across different test scenarios.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call