Abstract

Computer vision systems are insensitive to the scale of objects in natural scenes, so it is important to study the multi-scale representation of features. Res2Net implements hierarchical multi-scale convolution in residual blocks, but its random grouping method affects the robustness and intuitive interpretability of the network. We propose a new multi-scale convolution model based on multiple attention. It introduces the attention mechanism into the structure of a Res2-block to better guide feature expression. First, we adopt channel attention to score channels and sort them in descending order of the feature’s importance (Channels-Sort). The sorted residual blocks are grouped and intra-block hierarchically convolved to form a single attention and multi-scale block (AMS-block). Then, we implement channel attention on the residual small blocks to constitute a dual attention and multi-scale block (DAMS-block). Introducing spatial attention before sorting the channels to form multi-attention multi-scale blocks(MAMS-block). A MAMS-convolutional neural network (CNN) is a series of multiple MAMS-blocks. It enables significant information to be expressed at more levels, and can also be easily grafted into different convolutional structures. Limited by hardware conditions, we only prove the validity of the proposed ideas through convolutional networks of the same magnitude. The experimental results show that the convolution model with an attention mechanism and multi-scale features is superior in image classification.

Highlights

  • In recent years, deep learning has made a number of breakthroughs in the fields of computer vision [1,2], natural language processing [3,4], and speech recognition [5,6]

  • The performance of the three models we proposed exceeding the capabilities of Res2-convolutional neural network (CNN), SE-CNN, and ResNet

  • Even compared to the AMS-CNN and DAMS-CNN models, their performance is improved by 0.50% and 0.13%, respectively

Read more

Summary

Introduction

Deep learning has made a number of breakthroughs in the fields of computer vision [1,2], natural language processing [3,4], and speech recognition [5,6]. As one of the most typical deep learning models, convolutional neural networks (CNNs) have made considerable progress in image classification [7,8], object detection [9,10], image retrieval [11,12], and other applications. Several typical CNN models (including AlexNet [13], VGG [14], ResNet [15], etc.) were originally used for image classification and further demonstrated its versatility in other image processing tasks. It has the ability to pay attention to certain things while ignoring other things

Objectives
Methods
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call