Unsupervised Visual Representation Learning via Dual-Level Progressive Similar Instance Selection.

Hehe Fan,Ping Liu,Yi Yang,Mingliang Xu

doi:10.1109/tcyb.2021.3054978

Abstract

The superiority of deeply learned representations relies on large-scale labeled datasets. However, annotating data are usually expensive or even infeasible in some scenarios. To address this problem, we propose an unsupervised method to leverage instance discrimination and similarity for deep visual representation learning. The method is based on an observation that convolutional neural networks (CNNs) can learn a meaningful visual representation with instancewise classification, in which each instance is treated as an individual class. By this instancewise discriminative learning, instances can reasonably distribute in the representation space, which reveals their similarities. In order to further improve visual representations, we propose a dual-level progressive similar instance selection (DPSIS) method to build a bridge from instance to class by selecting similar instances (neighbors) for each instance (anchor) and treating the anchor and its neighbors as the same class. To be specific, DPSIS adaptively selects two levels of neighbors, that is: 1) an "absolutely similar level" and 2) a "relatively similar level." Instances in the absolutely similar level are used as hard labels, while instances in the relatively similar level are used as soft labels. Moreover, during training, DPSIS is able to progressively select more neighbors without human supervision. At the beginning of training, because CNNs are weak, most instances are distributed relatively randomly in the representation space and only a few easy-to-recognize instances are selected as neighbors. As CNN models become stronger, the semantic meaning of each instance grows clearer. Those instances originally distributed in a relatively random manner gradually move to meaningful positions. This consequently facilitates CNN training since the number of reliable samples increases. Experiments on seven benchmarks, including three small-scale and two large-scale coarse-grained image classification datasets, and two fine-grained categorization datasets, demonstrate the effectiveness of our DPSIS. Our codes have been released at https://github.com/hehefan/DPSIS.

Full Text