Extending Contrastive Learning to Unsupervised Redundancy Identification

Jeongwoo Ju,Junmo Kim,Heechul Jung

doi:10.3390/app12042201

Abstract

Modern deep neural network (DNN)-based approaches have delivered great performance for computer vision tasks; however, they require a massive annotation cost due to their data-hungry nature. Hence, given a fixed budget and unlabeled examples, improving the quality of examples to be annotated is a clever step to obtain good generalization of DNN. One of key issues that could hurt the quality of examples is the presence of redundancy, in which the most examples exhibit similar visual context (e.g., same background). Redundant examples barely contribute to the performance but rather require additional annotation cost. Hence, prior to the annotation process, identifying redundancy is a key step to avoid unnecessary cost. In this work, we proved that the coreset score based on cosine similarity (cossim) is effective for identifying redundant examples. This is because the collective magnitude of the gradient over redundant examples exhibits a large value compared to the others. As a result, contrastive learning first attempts to reduce the loss of redundancy. Consequently, cossim for the redundancy set exhibited a high value (low coreset score). We first viewed the redundancy identification as the gradient magnitude. In this way, we effectively removed redundant examples from two datasets (KITTI, BDD10K), resulting in a better performance in terms of detection and semantic segmentation.

Highlights

Deep-learning-based approaches have been a key technique in various computer vision tasks, such as image classification [1], object detection [2], and image segmentation [3]
We provided a theoretical explanation on the why coreset score established by contrastive learning is effective for capturing redundant examples
We focused on semantic segmentation tasks and used a validation set as the test set because it is inaccessible to the annotations of the test set

Summary

Introduction

Deep-learning-based approaches have been a key technique in various computer vision tasks, such as image classification [1], object detection [2], and image segmentation [3]. To reduce the annotation cost while achieving a great performance, assessing the quality of unlabeled examples could be a key step. In terms of the annotation cost, these redundant examples require a higher annotation budget while exhibiting less fruitful features. Yan et al [6] recently achieved key frame detection in a self-supervised manner, that is, at zero annotation cost. They verified their method by applying a thorough experiment on an action recognition dataset. Ju et al [7] achieved unsupervised coreset selection using a coreset score established by constrastive learning Their goal was to identify the subset of unlabelled data, which exhibit high contribution to the performance.

Objectives

Methods

Findings

Discussion

Conclusion