Abstract
Automatic image annotation is a key technology in image understanding and pattern recognition, and is becoming increasingly important in order to annotate large-scale images. In the past decade, the nearest neighbor model-based AIA (Automatic image annotation) methods have been proved to be the most successful in all classical models. This model has four major challenges including semantic gap, label-imbalance, wider range labels, and weak-labeling. In this paper, we propose a novel annotation model based on three-pass KNN (k-Nearest Neighbor) to address the aforementioned challenges. The key idea is to identify appropriate neighbors at each pass KNN. In the first pass KNN, we identify the several most relevant categories based on label feature rather than visual feature as traditional models. In the second pass KNN, we determine the relevant images based on multi-modal (visual and textual label) embedding features. As the test image has not been annotated with any label, we propose a pre-annotation strategory before image annotation to improve the semantic level. In the third pass KNN, we capture relevant labels from semantically and visually similar images and propagate them to the given unlabeled image. In contrast with traditional nearest neighbor based methods, our method can inherently alleviate the problems of semantic gap, label-imbalance, and wider range labels. In addition, to alleviate the issue of weak-labeling, we propose label refinement for training images. Extensive experiments on three classical benchmark datasets and MS-COCO demonstrate that the proposed method significantly outperforms the state-of-the-art in terms of per-label and per-image metrics.
Highlights
With the prevalence of digital photography and social networks in our daily lives, billions of images are generated and shared on the Internet
Significant advances have been achieved on largescale image recognition tasks [8], with deep learning models such as Convolutional Neural Network (CNN) and Generative Adversarial Network (GAN)
To resolve the problems of the weak-labeling and the labelimbalance, we propose a novel image annotation method based on nearest neighbors
Summary
With the prevalence of digital photography and social networks in our daily lives, billions of images are generated and shared on the Internet. To resolve the problems of the weak-labeling and the labelimbalance, we propose a novel image annotation method based on nearest neighbors. Rather than in traditional visual feature space, our proposed method refines labels for all training images in the label feature space, which can inherently address the problem of the semantic gap. Our proposed method maps visual feature vectors extracted by deep learning architecture (pre-trained VGG-16), and refines label vectors to a common feature space by the KCCA model. B. LABEL REFINEMENT To alleviate the shortcoming of the weak-labeling, most methods devise sophisticated models with expensive time and space cost in annotation process.Tang proposed a novel tri-clustered tensor completion framework to collaboratively explore these three kinds of information to improve the performance of social image tag refinement [32].Tang proposed a novel Social anchor-Unit GrAph Regularized Tensor Completion (SUGAR-TC) method to efficiently refine the tags of social images, which is insensitive to the scale of data [33]. A category (k) is defined as the mean of all images’ label features in this category, denoted as:
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.