Abstract

Self-supervised learning can be adopted to mine deep semantic information of visual data without a large number of human-annotated supervision by using a pretext task to pretrain a model. In this study, we proposed a novel self-supervised learning paradigm, namely multi-task self-supervised (MTSS) representation learning. Unlike existing self-supervised learning methods, which pretrain neural networks on the pretext task and then fine-tune the parameters of neural networks on the downstream task, in our scheme, downstream and pretext tasks are considered primary and auxiliary tasks, respectively, and are trained simultaneously. Our method involves maximizing the similarity of two augmented views of an image as an auxiliary task and using a multi-task network to train the primary task alongside the auxiliary task. We evaluated the proposed method on standard datasets and backbones through a rigorous experimental procedure. Experimental results revealed that proposed MTSS can achieve better performance and robustness than other self-supervised learning methods on multiple image classification data sets without using negative sample pairs and large batches. This simple yet effective method can inspire people to rethink self-supervised learning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call