Abstract
Nowadays, with the rapid growth of imaging and social network, huge volumes of image data are produced and shared on social media. Social image annotation has been an important and challenging task in the fields of computer vision and machine learning, which can facilitate large-scale image retrieval, indexing, and management. The four most challenges of social image annotation are semantic gap, tag refinement, label-imbalance, and annotation efficiency. To address these issues, we propose an efficient and effective annotation method based on the Mean of Positive Examples (MPE) corresponding to each label. First, we refine user-provided noisy tags by our proposed local smoothing process, and consider the refined tags as key features in contrast to the previous methods that consider them as side information, which significantly improves annotation performance. Second, we propose a weighted trans-media similarity measure method that fuses all modality information in identifying proper neighbors, which promotes the semantic level and eases image annotation. Third, our MPE model gives equal importance to all labels, thus, improving the annotation performance of infrequent labels without sacrificing that of frequent labels. Fourth, our MPE model can dramatically decrease space-time overheads, since the time cost of annotating an image is unaffected by the size of the training image dataset, but relying on the size of label vocabulary. Therefore, our proposed method can be applied to real-world large-scale online social image repositories. Extensive experiments on both benchmark datasets demonstrate the effectiveness and efficiency of our MPE model.
Highlights
W ITH the ever-growing popularity of digital photography and social media (e.g. Facebook, YouTube, Twitter, and Wechat), billions of social images associated with user-provided tags are generated and shared on social networks, which has made social image annotation one of the most important problems in the fields of computer vision and machine learning [1]
To tackle the above problems, in this paper, we propose a novel annotation approach based on representative features of labels that are generated by the mean vectors of positive examples in canonical correlation analysis (CCA) space jointly mapped by visual features and textual tag features
CANONICAL CORRELATION ANALYSIS Several models exploit the direct correlations of multi-modal information and embed the visual features and tag features into a common space based on canonical correlation analysis (CCA) or kernel CCA (KCCA), which aims to reduce the semantic gap and to give better annotation performance [6], [34]
Summary
W ITH the ever-growing popularity of digital photography and social media (e.g. Facebook, YouTube, Twitter, and Wechat), billions of social images associated with user-provided tags are generated and shared on social networks, which has made social image annotation one of the most important problems in the fields of computer vision and machine learning [1]. F. CANONICAL CORRELATION ANALYSIS Several models exploit the direct correlations of multi-modal information and embed the visual features and tag features into a common space based on canonical correlation analysis (CCA) or kernel CCA (KCCA), which aims to reduce the semantic gap and to give better annotation performance [6], [34]. CANONICAL CORRELATION ANALYSIS Several models exploit the direct correlations of multi-modal information and embed the visual features and tag features into a common space based on canonical correlation analysis (CCA) or kernel CCA (KCCA), which aims to reduce the semantic gap and to give better annotation performance [6], [34] These approaches demonstrate the benefit of exploiting side information and multi-modal correlations between visual features and labels, but they only rely on ground truth annotations [11]. Where V (I) and V (J) represent the visual feature vectors of image I and image J, respectively, K1 and K2 are transform factors, and T (I) and T (J) represent the tag feature vectors of image I and image J, respectively
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.