One Spatio-Temporal Sharpening Attention Mechanism for Light-Weight YOLO Models Based on Sharpening Spatial Attention.

Mengfan Xue,Huajie Chen,Dongliang Peng,Minghao Chen,Yunfei Guo

doi:10.3390/s21237949

Abstract

Attention mechanisms have demonstrated great potential in improving the performance of deep convolutional neural networks (CNNs). However, many existing methods dedicate to developing channel or spatial attention modules for CNNs with lots of parameters, and complex attention modules inevitably affect the performance of CNNs. During our experiments of embedding Convolutional Block Attention Module (CBAM) in light-weight model YOLOv5s, CBAM does influence the speed and increase model complexity while reduce the average precision, but Squeeze-and-Excitation (SE) has a positive impact in the model as part of CBAM. To replace the spatial attention module in CBAM and offer a suitable scheme of channel and spatial attention modules, this paper proposes one Spatio-temporal Sharpening Attention Mechanism (SSAM), which sequentially infers intermediate maps along channel attention module and Sharpening Spatial Attention (SSA) module. By introducing sharpening filter in spatial attention module, we propose SSA module with low complexity. We try to find a scheme to combine our SSA module with SE module or Efficient Channel Attention (ECA) module and show best improvement in models such as YOLOv5s and YOLOv3-tiny. Therefore, we perform various replacement experiments and offer one best scheme that is to embed channel attention modules in backbone and neck of the model and integrate SSAM into YOLO head. We verify the positive effect of our SSAM on two general object detection datasets VOC2012 and MS COCO2017. One for obtaining a suitable scheme and the other for proving the versatility of our method in complex scenes. Experimental results on the two datasets show obvious promotion in terms of average precision and detection performance, which demonstrates the usefulness of our SSAM in light-weight YOLO models. Furthermore, visualization results also show the advantage of enhancing positioning ability with our SSAM.

Highlights

Convolutional neural networks have achieved great progress in the field of visual object detection and tracking by rich and expressive performance
It is inferred that our deduction of the Sharpening Spatial Attention (SSA) module at the theory part is relatively correct, embedding the SSA module can improve the edge information of large objects, which can be seen in the visualization results
We set an empirical assumption about the loss of edge information of the object and focus on strengthening spatial edge information for deep convolutional neural networks (CNNs) with low computational complexity and parameters

Summary

Introduction

Convolutional neural networks have achieved great progress in the field of visual object detection and tracking by rich and expressive performance. The visual object detector based on neural network can be divided into one-stage detector [4,5,6,7,8,9,10]. The most representative two-stage object detector is the RCNN [13] series, which generally extracts the image feature by feature extraction networks, inputs the feature maps into region proposal network to generate regions of interest as first prediction and makes classification and regression operations as second prediction. While the one-stage detector only passes through one prediction operation to perform the object detection task and combines classification and positioning together. The most clearly direction for studying the one-stage detector is how to modify the network structure while maintaining its detection speed to enhance the detection accuracy and classification performance of the network

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors	Publication Date: Nov 28, 2021
Citations: 17	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

One Spatio-Temporal Sharpening Attention Mechanism for Light-Weight YOLO Models Based on Sharpening Spatial Attention.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors

Lead the way for us

Similar Papers

3D Residual Networks with Channel-Spatial Attention Module for Action Recognition
Ziwen Yi ... Kebin Jia
-
Ziwen Yi, et. al.Ziwen Yi ... Kebin Jia
06 Nov 2020
06 Nov 2020

Lip-Reading Research Based on ShuffleNet and Attention-GRU
Yixian Fu ... Yuanyao Lu
-
Yixian Fu, et. al.Yixian Fu ... Yuanyao Lu
01 Jan 2023
01 Jan 2023

A CBAM Based Multiscale Transformer Fusion Approach for Remote Sensing Image Change Detection
Wei Wang ... Xinai Tan
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 15
Wei Wang, et. al.Wei Wang ... Xinai Tan
01 Jan 2021
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 15

3D mineral prospectivity modeling in the Sanshandao goldfield, China using the convolutional neural network with attention mechanism
Zhankun Liu ... Xiancheng Mao
Ore Geology Reviews | VOL. 164
Zhankun Liu, et. al.Zhankun Liu ... Xiancheng Mao
31 Dec 2024
Ore Geology Reviews | VOL. 164

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

One Spatio-Temporal Sharpening Attention Mechanism for Light-Weight YOLO Models Based on Sharpening Spatial Attention.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors