Abstract

Convolutional neural network (CNN)-based self-supervised visual representation learning (SSL) is a long-standing problem achieving significant successes with traditional handcrafted pretext tasks and contrastive learning. However, existing SSL methods typically suffer from high computational overhead and poor performance due to sluggish convergence speeds and poor detail extraction capabilities. In this work, to address these issues and improve the robustness, we provide a new self-supervised architecture for incorporating a single-branch backbone with hybrid tasks into the representation learning process. Specifically, our method takes advantage of features from both intra- and inter-images by using discrete montage images. Then a single backbone with a novel Adaptive Decouple Confusion (ADC) module is proposed to improve the feature extraction capabilities and alleviate the confusion regions in montage images. Besides, both concatenated discrete vectors and patch-based global average pooled vectors in latent space are utilized to learn local detailed features and maintain semantic consistency simultaneously. Moreover, our method is optimized by hybrid tasks and enjoys faster convergence speed due to these improvements. Extensive experiments on several datasets demonstrate the effectiveness and robustness of our method. The proposed method has 2.0% improved in linear classification to the conventional single-branch methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call