Extending Contrastive Learning to Unsupervised Coreset Selection

Jeongwoo Ju,Junmo Kim,Yoonju Oh,Heechul Jung

doi:10.1109/access.2022.3142758

Abstract

Self-supervised contrastive learning offers a means of learning informative features from a pool of unlabeled data. In this paper, we investigate another useful approach. We propose an entirely unlabeled coreset selection method. In this regard, contrastive learning, one of several self-supervised methods, was recently proposed and has consistently delivered the highest performance. This prompted us to choose two leading methods for contrastive learning: the simple framework for contrastive learning of visual representations (SimCLR) and the momentum contrastive (MoCo) learning framework. We calculated the cosine similarities for each example of an epoch for the entire duration of the contrastive learning process and subsequently accumulated the cosine similarity values to obtain the coreset score. Our assumption was that a sample with low similarity would likely behave as a coreset. Compared with existing coreset selection methods with labels, our approach reduced the cost associated with human annotation. In this study, the unsupervised method implemented for coreset selection achieved improvements of 1.25% (for CIFAR10), 0.82% (for SVHN), and 0.19% (for QMNIST) over a randomly selected subset with a size of 30%. Furthermore, our results are comparable to those of the existing supervised coreset selection methods. The differences between the proposed and the above mentioned supervised coreset selection method (forgetting events) were 0.81% on the CIFAR10 dataset, −2.08% on the SVHN dataset (the proposed method outperformed the existing method), and 0.01% on the QMNIST dataset at a subset size of 30%. In addition, our proposed approach exhibited robustness even if the coreset selection model and target model were not identical (e.g., using ResNet18 as a selection model and ResNet101 as the target model). Lastly, we obtained more concrete proof that our coreset examples are highly informative by showing the performance gap between the coreset and non-coreset samples in the coreset cross test experiment. We observed a pair of performance ((testing: non-coreset, training: coreset), (testing: coreset, training: non-coreset)), i.e. (94.27%, 67.39 %) for CIFAR10, (98.24%, 83.30%) for SVHN, and (99.89%, 93.07%) for QMNIST with a subset size of 30%.

Highlights

Deep learning-based methods have been highly effective in performing computer vision tasks such as image classification [1], object detection [2], and semantic segmentation [3]
The results of our study demonstrate that contrastive learning can extend to unsupervised coreset selection
We plotted the distribution of the average cossim for each dataset, as shown in Fig. 9, It is evident that relatively more informative datasets are in the order of CIFAR10, SVHN, and QMNIST because these datasets are composed of RGB + objects, RGB + digits, and gray scale + digits, respectively

Summary

Introduction

Deep learning-based methods have been highly effective in performing computer vision tasks such as image classification [1], object detection [2], and semantic segmentation [3]. These methods generally require large amounts of data to produce accurate results; in particular, human annotation, as an essential part of supervised learning, can. In other words, when building a new training dataset for deep learning, we should consider the following constraints: i) huge annotation costs (we cannot afford to annotate all given a huge number of unlabeled data), ii) limited storage, iii) limited computation power These limitations increase linearly with the number of examples. Traditional methods mainly perform random selection, which is likely to miss the most informative

Objectives

Findings

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2022
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Extending Contrastive Learning to Unsupervised Coreset Selection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Graph Barlow Twins: A self-supervised representation learning framework for graphs
Piotr Bielak ... Nitesh V Chawla
Knowledge Based Systems | VOL. 256
Piotr Bielak, et. al.Piotr Bielak ... Nitesh V Chawla
17 Aug 2022
Knowledge Based Systems | VOL. 256

Improved Highway Network Block for Training Very Deep Neural Networks
Oyebade K Oyedotun ... Djamila Aouada
IEEE access : practical innovations, open solutions | VOL. 8
Oyebade K Oyedotun, et. al.Oyebade K Oyedotun ... Djamila Aouada
01 Jan 2020
IEEE access : practical innovations, open solutions | VOL. 8

Visual Representation Learning with Minimal Supervision

-

24 Feb 2021
24 Feb 2021

Study on Various Self-supervised Video Representation Learning Methods
Soohyun Park ... Jongwon Choi
Moving Image & Technology (MINT) | VOL. 2
Soohyun Park, et. al.Soohyun Park ... Jongwon Choi
31 Aug 2022
Moving Image & Technology (MINT) | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extending Contrastive Learning to Unsupervised Coreset Selection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions