End-to-End Automatic Image Annotation Based on Deep CNN and Multi-Label Data Augmentation

Xiao Ke,Jiawei Zou,Yuzhen Niu

doi:10.1109/tmm.2019.2895511

Abstract

Automatic image annotation is a key step in image retrieval and image understanding. In this paper, we present an end-to-end automatic image annotation method based on a deep convolutional neural network (CNN) and multi-label data augmentation. Different from traditional annotation models that usually perform feature extraction and annotation as two independent tasks, we propose an end-to-end automatic image annotation model based on deep CNN (E2E-DCNN). E2E-DCNN transforms the image annotation problem into a multi-label learning problem. It uses a deep CNN structure to carry out the adaptive feature learning before constructing the end-to-end annotation structure using multiple cross-entropy loss functions for training. It is difficult to train a deep CNN model using small-scale datasets or scale up multi-label datasets using traditional data augmentation methods; hence, we propose a multi-label data augmentation method based on Wasserstein generative adversarial networks (ML-WGAN). The ML-WGAN generator can approximate the data distribution of a single multi-label image. The images generated by ML-WGAN can assist in the reduction of the over-fitting problem of training a deep CNN model and enhance the generalization ability of the trained CNN model. We optimize the network structure by using deformable convolution and spatial pyramid pooling. We experiment the proposed E2E-DCNN model with data augmentation by the proposed ML-WGAN on several public datasets. The experimental results demonstrate that the proposed model outperforms the state-of-the-art automatic image annotation models.

Full Text