Abstract
The attention mechanism plays a crucial role in the human visual experience. In the cognitive neuroscience community, the receptive field size of visual cortical neurons is regulated by the additive effect of feature-selective and spatial attention. We propose a novel architectural unit called a “Feature-selective and Spatial Receptive Fields” (FSRF) block that implements adaptive receptive field sizes of neurons through the additive effects of feature-selective and spatial attention. We show that FSRF blocks can be inserted into the architecture of existing convolutional neural networks to form an FSRF network architecture, and test its generalization capabilities on different datasets.
Highlights
In recent years, the field of computer vision has undergone tremendous changes, with deep learning becoming a powerful tool
Main contributions of this work are summarized as follows: (1) We propose a simple and effective attention block (FSRF) that can be widely applied to boost representation power of convolutional neural networks (CNNs); (2) We validate the effectiveness of the Feature-selective and Spatial Receptive Fields” (FSRF) block through extensive ablation studies; (3) We demonstrate that the FSRF network (FSRFNet) outperforms previous state-of-the-art models on datasets of different sizes, and successfully embed an FSRF block into lightweight models (e.g., ShuffleNetV2 [38] and MobileNetV2 [39])
The results show that the FSRF block is consistent in improving the performance of state-of-the-art attention-based CNNs
Summary
The field of computer vision has undergone tremendous changes, with deep learning becoming a powerful tool. Owing to its data-driven nature and the availability of massively parallel computing, deep neural networks have achieved state-of-the-art results in most areas, and researchers have designed many advanced network architectures [1,2,3,4,5,6,7,8,9,10,11,12]. With improvements in detection accuracy and real-time performance, the object detection algorithm based on deep learning has gradually developed into two types: the two-stage approach and one-stage approach. Compared with the two-stage approach [13,16,17,18], the one-stage approach has a better real-time performance while maintaining better detection accuracy
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.