A localization strategy combined with transfer learning for image annotation.

Zhiqiang Chen,Aidi Zhao,Jianfang Cao,Xiaohui Hu,Leelavathi Rajamanickam,Wajid Mumtaz

doi:10.1371/journal.pone.0260758

Abstract

This study aims to solve the overfitting problem caused by insufficient labeled images in the automatic image annotation field. We propose a transfer learning model called CNN-2L that incorporates the label localization strategy described in this study. The model consists of an InceptionV3 network pretrained on the ImageNet dataset and a label localization algorithm. First, the pretrained InceptionV3 network extracts features from the target dataset that are used to train a specific classifier and fine-tune the entire network to obtain an optimal model. Then, the obtained model is used to derive the probabilities of the predicted labels. For this purpose, we introduce a squeeze and excitation (SE) module into the network architecture that augments the useful feature information, inhibits useless feature information, and conducts feature reweighting. Next, we perform label localization to obtain the label probabilities and determine the final label set for each image. During this process, the number of labels must be determined. The optimal K value is obtained experimentally and used to determine the number of predicted labels, thereby solving the empty label set problem that occurs when the predicted label values of images are below a fixed threshold. Experiments on the Corel5k multilabel image dataset verify that CNN-2L improves the labeling precision by 18% and 15% compared with the traditional multiple-Bernoulli relevance model (MBRM) and joint equal contribution (JEC) algorithms, respectively, and it improves the recall by 6% compared with JEC. Additionally, it improves the precision by 20% and 11% compared with the deep learning methods Weight-KNN and adaptive hypergraph learning (AHL), respectively. Although CNN-2L fails to improve the recall compared with the semantic extension model (SEM), it improves the comprehensive index of the F1 value by 1%. The experimental results reveal that the proposed transfer learning model based on a label localization strategy is effective for automatic image annotation and substantially boosts the multilabel image annotation performance.

Highlights

The development of multimedia technology has increased the amounts of all types of multimedia data
Deep learning has a remarkable performance advantage for image classification, in real life, the scales of existing multilabel image datasets are too small, which results in overfitting during the deep learning process, making it difficult to take full advantage of the capabilities of deep learning networks. To address problems such as information loss caused by using manual features in traditional machine learning, insufficient datasets in deep learning, and empty prediction label sets caused by a fixed threshold, we propose a transfer learning model based on a label localization strategy
To verify the effectiveness of the transfer learning model for automatic image annotation based on the label localization strategy, we use the benchmark Corel5k [26] dataset collected by Corel and the image dataset of natural scenes MIML provided by the Institute of Machine Learning and Data Mining of Nanjing University

Summary

Introduction

The development of multimedia technology has increased the amounts of all types of multimedia data. As the main representative of multimedia data, images have been the primary focus of many studies. Methods of classifying single objects in images have become highly sophisticated. In real life, images often include multiple objects; relying on only a single keyword to represent image semantics is often insufficient. The multilabel image annotation field has emerged to solve this problem. By assigning multiple labels to an image, the labels more accurately capture the true image semantics and better match the real world

Objectives

Methods

Results

Conclusion