Automatic image annotation method based on a convolutional neural network with threshold optimization.

Jianfang Cao,Aidi Zhao,Zibang Zhang,Robertas Damasevicius

doi:10.1371/journal.pone.0238956

Jianfang Cao, Aidi Zhao + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0238956

Copy DOI

Abstract

In this study, a convolutional neural network with threshold optimization (CNN-THOP) is proposed to solve the issue of overlabeling or downlabeling arising during the multilabel image annotation process in the use of a ranking function for label annotation along with prediction probability. This model fuses the threshold optimization algorithm to the CNN structure. First, an optimal model trained by the CNN is used to predict the test set images, and batch normalization (BN) is added to the CNN structure to effectively accelerate the convergence speed and obtain a group of prediction probabilities. Second, threshold optimization is performed on the obtained prediction probability to derive an optimal threshold for each class of labels to form a group of optimal thresholds. When the prediction probability for this class of labels is greater than or equal to the corresponding optimal threshold, this class of labels is used as the annotation result for the image. During the annotation process, the multilabel annotation for the image to be annotated is realized by loading the optimal model and the optimal threshold. Verification experiments are performed on the MIML, COREL5K, and MSRC datasets. Compared with the MBRM, the CNN-THOP increases the average precision on MIML, COREL5K, and MSRC by 27%, 28% and 33%, respectively. Compared with the E2E-DCNN, the CNN-THOP increases the average recall rate by 3% on both COREL5K and MSRC. The most precise annotation effect for CNN-THOP is observed on the MIML dataset, with a complete matching degree reaching 64.8%.

Highlights

With the continued development of network technology and the growing popularity of multimedia devices, network image data are growing at an exponential rate
To verify the effectiveness of the convolutional neural network [22] (CNN)-THOP proposed in this study for image annotation, we use free, publicly available datasets: MIML [30] on natural scenes provided by Learning and Mining from Data (LAMDA) of Nanjing University, COREL5K [31] collated by the Corel Company and MSRC [32] from Microsoft Research Cambridge
To address a fixed number of labels appearing during the multilabel image annotation process and label annotation according to the ranking function, we propose in this study the application of a CNN-THOP for image annotation

Summary

Introduction

With the continued development of network technology and the growing popularity of multimedia devices, network image data are growing at an exponential rate. Taking WeChat (a communication software) as an example, the daily number of uploaded images in WeChat moments exceeds a hundred million [1]. In this information explosion era, organizing and retrieving unlabeled images has become a research interest in the field of image management [2]. Unlike previous single-label classified images, most images currently contain rich semantic content, where a common image normally contains several keywords or labels [3].

Methods

Results

Discussion

Conclusion