Abstract
Convolutional neural network (CNN)-based self-supervised visual representation learning (SSL) is a long-standing problem achieving significant successes with traditional handcrafted pretext tasks and contrastive learning. However, existing SSL methods typically suffer from high computational overhead and poor performance due to sluggish convergence speeds and poor detail extraction capabilities. In this work, to address these issues and improve the robustness, we provide a new self-supervised architecture for incorporating a single-branch backbone with hybrid tasks into the representation learning process. Specifically, our method takes advantage of features from both intra- and inter-images by using discrete montage images. Then a single backbone with a novel Adaptive Decouple Confusion (ADC) module is proposed to improve the feature extraction capabilities and alleviate the confusion regions in montage images. Besides, both concatenated discrete vectors and patch-based global average pooled vectors in latent space are utilized to learn local detailed features and maintain semantic consistency simultaneously. Moreover, our method is optimized by hybrid tasks and enjoys faster convergence speed due to these improvements. Extensive experiments on several datasets demonstrate the effectiveness and robustness of our method. The proposed method has 2.0% improved in linear classification to the conventional single-branch methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.