Abstract

How to effectively extract features with high representation ability has always been a research topic and a challenge for classification tasks. Most of the existing methods mainly solve the problem by using deep convolutional neural networks as feature extractors. Although a series of excellent network structures have been successful in the field of Chinese ink‐wash painting classification, but most of them adopted the methods of only simple augmentation of the network structures and direct fusion of different scale features, which limit the network to further extract semantically rich and scale‐invariant feature information, thus hindering the improvement of classification performance. In this paper, a novel model based on multi‐level attention and multi‐scale feature fusion is proposed. The model extracts three types of feature maps from the low‐level, middle‐level and high‐level layers of the pretrained deep neural network firstly. Then, the low‐level and middle‐level feature maps are processed by the spatial attention module, nevertheless the high‐level feature maps are processed by the scale invariance module to increase the scale‐invariance properties. Moreover, the conditional random field module is adopted to fuse the optimized three‐scale feature maps, and the channel attention module is followed to refine the features. Finally, the multi‐level deep supervision strategy is utilized to optimize the model for better performance. To verify the effectiveness of the model, extensive experimental results on the Chinese ink‐wash painting dataset created in this work show that the classification performance of the model is better than other mainstream research methods.

Highlights

  • In recent years, the most effective visual recognition tasks are based on the complex and deep convolutional neural networks (CNNs) which stack multiple convolution and pooling layers to generate the high-level semantic features [1–9] or other technology [10, 11]

  • Based on the above discussion, a novel model based on multi-level attention mechanism and multi-scale fusion is proposed in this paper. e model mainly uses the attention mechanism to process the multi-scale feature maps and designs an elaborate feature fusion strategy to learn more discriminative feature representations for obtaining better classification performance

  • Erefore, to find the most effective classifier, the classifiers tested in this experiment are carried out, in which includes: K-nearest neighbor (KNN), logistic regression (LR), random forest (RF), and support vector machine (SVM)

Read more

Summary

Introduction

The most effective visual recognition tasks are based on the complex and deep convolutional neural networks (CNNs) which stack multiple convolution and pooling layers to generate the high-level semantic features [1–9] or other technology [10, 11]. To further improve the classification performance, the handcraft features extracted from Chinese IWPs are integrated with those high-level semantic features in some works. Some works introduced that the low-level and high-level features extracted from different layers [15–17] were integrated to improve the classification performance. Whereas, in those methods they ignored the middle-level features which were complementary and could contribute to the final performance. To address the problems mentioned above on the classification of Chinese IWPs, our work explores how to make the best use of the low-level, middle-level, and highlevel image features obtained from different layers, and how to take full advantage of the attention mechanism to fuse the multi-scale information for achieving better classification performance.

Related Work
Deeper Architecture Design
Attention Mechanism
Proposed Architecture
Multiscale Features Extraction
Spatial Attention Module
Scale Invariance Processing
Multiscale Feature Fusion Based on Conditional Random Field
Channel Attention Module
Multilevel Deep Supervision
Experimental
Performance Analysis of Different Classifiers
Performance Analysis of Different Methods
Ablation Study
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.