DDAE: Towards Deep Dynamic Vision BERT Pretraining

Honghao Chen,Xiangyu Zhang,Kaiqi Huang,Xin Zhao,Xiangwen Kong

doi:10.1609/aaai.v38i2.27864

Abstract

Recently, masked image modeling (MIM) has demonstrated promising prospects in self-supervised representation learning. However, existing MIM frameworks recover all masked patches equivalently, ignoring that the reconstruction difficulty of different patches can vary sharply due to their diverse distance from visible patches. In this paper, we propose a novel deep dynamic supervision to enable MIM methods to dynamically reconstruct patches with different degrees of difficulty at different pretraining phases and depths of the model. Our deep dynamic supervision helps to provide more locality inductive bias for ViTs especially in deep layers, which inherently makes up for the absence of local prior for self-attention mechanism. Built upon the deep dynamic supervision, we propose Deep Dynamic AutoEncoder (DDAE), a simple yet effective MIM framework that utilizes dynamic mechanisms for pixel regression and feature self-distillation simultaneously. Extensive experiments across a variety of vision tasks including ImageNet classification, semantic segmentation on ADE20K and object detection on COCO demonstrate the effectiveness of our approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DDAE: Towards Deep Dynamic Vision BERT Pretraining

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

Randomly shuffled convolution for self-supervised representation learning
Youngjin Oh ... Hyunwoo J Kim
Information Sciences | VOL. 623
Youngjin Oh, et. al.Youngjin Oh ... Hyunwoo J Kim
14 Nov 2022
Information Sciences | VOL. 623

Self-supervised Representation Learning Using 360° Data
Junnan Li ... Shoji Nishimura
-
Junnan Li, et. al.Junnan Li ... Shoji Nishimura
15 Oct 2019
15 Oct 2019

A Novel Solution for EEG-based Emotion Recognition
Zhuofan Xie ... Mingzhang Zhou
-
Zhuofan Xie, et. al.Zhuofan Xie ... Mingzhang Zhou
13 Oct 2021
13 Oct 2021

CaSS: A Channel-Aware Self-supervised Representation Learning Framework for Multivariate Time Series Classification
Yijiang Chen ... Zhen Xing
-
Yijiang Chen, et. al.Yijiang Chen ... Zhen Xing
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DDAE: Towards Deep Dynamic Vision BERT Pretraining

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence