Abstract

Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multimodal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.

Highlights

  • IntroductionManuscript received: 2021-12-31; accepted: 2022-01-18 regions of an image and disregarding irrelevant parts are called attention mechanisms; the human visual system uses one [1–4] to assist in analyzing and understanding complex scenes efficiently and effectively

  • Methods for diverting attention to the most importantManuscript received: 2021-12-31; accepted: 2022-01-18 regions of an image and disregarding irrelevant parts are called attention mechanisms; the human visual system uses one [1–4] to assist in analyzing and understanding complex scenes efficiently and effectively

  • Chaudhari et al [141] provided a survey of attention models in deep neural networks which concentrates on their application to natural language processing, while our work focuses on computer vision

Read more

Summary

Introduction

Manuscript received: 2021-12-31; accepted: 2022-01-18 regions of an image and disregarding irrelevant parts are called attention mechanisms; the human visual system uses one [1–4] to assist in analyzing and understanding complex scenes efficiently and effectively. This in turn has inspired researchers to introduce attention mechanisms into computer vision systems to improve their performance. The first phase begins from RAM [31], pioneering work that combined deep neural networks with attention mechanisms It recurrently predicts the important region and updates the whole network in an end-to-end manner through a policy gradient.

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.