With the digital development of artwork, more and more artwork images are made available to the public. In iMet Collection Dataset which is part of the FGVC6 workshop at CVPR 2019, an artwork image is annotated by multiple labels. Because of the local and subtle differences between artwork images with different attribute labels, artwork attribute recognition can be considered as a fine-grained visual categorization (FGVC) task. In this paper, a multi-layer and multi-order network (MLMO-Net) is proposed to capture both first- and second-order information in the artwork images. First-order information can be used to characterize the global spatial information and second-order information can be used to characterize the local statistical information. Both of first- and second-order information from multiple layers are aggregated together in MLMO-Net. In our experiments, several convolutional neural network (CNN) architectures, such as Vgg16 and ResNet50, and recent FGVC methods, such as the bilinear CNN, hierarchical bilinear CNN, and Navigator–teacher–scrutinizer network (NTS-Net), are tested on the iMet Collection 2019 Dataset. Experimental results shown that MLMO-Net achieves improvements over baseline methods. Through the research of this paper, a direction to improve the performance of artwork attribute recognition could be provided.
Read full abstract