Abstract

Modern deep neural network (DNN)-based approaches have delivered great performance for computer vision tasks; however, they require a massive annotation cost due to their data-hungry nature. Hence, given a fixed budget and unlabeled examples, improving the quality of examples to be annotated is a clever step to obtain good generalization of DNN. One of key issues that could hurt the quality of examples is the presence of redundancy, in which the most examples exhibit similar visual context (e.g., same background). Redundant examples barely contribute to the performance but rather require additional annotation cost. Hence, prior to the annotation process, identifying redundancy is a key step to avoid unnecessary cost. In this work, we proved that the coreset score based on cosine similarity (cossim) is effective for identifying redundant examples. This is because the collective magnitude of the gradient over redundant examples exhibits a large value compared to the others. As a result, contrastive learning first attempts to reduce the loss of redundancy. Consequently, cossim for the redundancy set exhibited a high value (low coreset score). We first viewed the redundancy identification as the gradient magnitude. In this way, we effectively removed redundant examples from two datasets (KITTI, BDD10K), resulting in a better performance in terms of detection and semantic segmentation.

Highlights

  • Deep-learning-based approaches have been a key technique in various computer vision tasks, such as image classification [1], object detection [2], and image segmentation [3]

  • We provided a theoretical explanation on the why coreset score established by contrastive learning is effective for capturing redundant examples

  • We focused on semantic segmentation tasks and used a validation set as the test set because it is inaccessible to the annotations of the test set

Read more

Summary

Introduction

Deep-learning-based approaches have been a key technique in various computer vision tasks, such as image classification [1], object detection [2], and image segmentation [3]. To reduce the annotation cost while achieving a great performance, assessing the quality of unlabeled examples could be a key step. In terms of the annotation cost, these redundant examples require a higher annotation budget while exhibiting less fruitful features. Yan et al [6] recently achieved key frame detection in a self-supervised manner, that is, at zero annotation cost. They verified their method by applying a thorough experiment on an action recognition dataset. Ju et al [7] achieved unsupervised coreset selection using a coreset score established by constrastive learning Their goal was to identify the subset of unlabelled data, which exhibit high contribution to the performance.

Objectives
Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call