Multi-View Masked Autoencoder for General Image Representation

Seungbin Ji,Sangkwon Han,Jongtae Rhee

doi:10.3390/app132212413

Abstract

Self-supervised learning is a method that learns general representation from unlabeled data. Masked image modeling (MIM), one of the generative self-supervised learning methods, has drawn attention for showing state-of-the-art performance on various downstream tasks, though it has shown poor linear separability resulting from the token-level approach. In this paper, we propose a contrastive learning-based multi-view masked autoencoder for MIM, thus exploiting an image-level approach by learning common features from two different augmented views. We strengthen the MIM by learning long-range global patterns from contrastive loss. Our framework adopts a simple encoder–decoder architecture, thus learning rich and general representations by following a simple process: (1) Two different views are generated from an input image with random masking and by contrastive loss, we can learn the semantic distance of the representations generated by an encoder. By applying a high mask ratio, of 80%, it works as strong augmentation and alleviates the representation collapse problem. (2) With reconstruction loss, the decoder learns to reconstruct an original image from the masked image. We assessed our framework through several experiments on benchmark datasets of image classification, object detection, and semantic segmentation. We achieved 84.3% in fine-tuning accuracy on ImageNet-1K classification and 76.7% in linear probing, thus exceeding previous studies and showing promising results on other downstream tasks. The experimental results demonstrate that our work can learn rich and general image representation by applying contrastive loss to masked image modeling.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Nov 16, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Multi-View Masked Autoencoder for General Image Representation

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Self-supervised multi-task learning for semantic segmentation of urban scenes
Jonathan Gonzalez-Santiago ... Wolfgang Middelmann
-
Jonathan Gonzalez-Santiago, et. al.Jonathan Gonzalez-Santiago ... Wolfgang Middelmann
12 Sep 2021
12 Sep 2021

Self-supervised Learning with Local Contrastive Loss for Detection and Semantic Segmentation
Ashraful Islam ... Peter Morales
-
Ashraful Islam, et. al.Ashraful Islam ... Peter Morales
01 Jan 2023
01 Jan 2023

DATA: Domain-Aware and Task-Aware Self-supervised Learning
Qing Chang ... Jiajun Sun
-
Qing Chang, et. al.Qing Chang ... Jiajun Sun
01 Jun 2022
01 Jun 2022

CDEST: Class Distinguishability-Enhanced Self-Training Method for Adopting Pre-Trained Models to Downstream Remote Sensing Image Semantic Segmentation
Ming Zhang ... Ji Qi
Remote Sensing | VOL. 16
Ming Zhang, et. al.Ming Zhang ... Ji Qi
06 Apr 2024
Remote Sensing | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi-View Masked Autoencoder for General Image Representation

Abstract

Talk to us

Similar Papers

More From: Applied Sciences