Abstract
Masked image modeling (MIM) has been considered as the state-of-the-art (SOTA) self-supervised learning (SSL) technique in terms of visual pretraining. The impressive generalization ability of MIM also paves the way for the remarkable success of large-scale vision foundation models. In this article, we further discuss the validity and advantages of implementing MIM techniques in the reproducing kernel Hilbert spaces (RKHSs) and we associate the analysis with a novel MIM method named R-MIM (short for RKHS-MIM). Through the careful construction of an augmentation graph and by using spectral decomposition techniques, we establish a systematic theoretical understanding between the proposed R-MIM's generalization ability and the choice of kernel function used during training. Specifically, we reach a conclusion that both of the local Lipschitz constant of the resultant R-MIM model and the corresponding expected pretraining error can have a strong composite effect on bounding downstream task error, depending on the kernel options. We demonstrate that under mild mathematical assumptions, R-MIM method is guaranteed to return a lower bound on downstream tasks in comparison to vanilla MIM techniques, such as masked autoencoder (MAE) and SimMIM. Empirical justification well corroborates our theoretical hypothesis and analysis in showing the superior generalization of the proposed R-MIM and the theoretical link to kernel choices. The code is available at: https://github.com/yurui-q/R-MIM.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE transactions on neural networks and learning systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.