Enhancing performance of vision transformers on small datasets through local inductive bias incorporation

Ibrahim Batuhan Akkaya,Bahram Zonooz,Elahe Arani,Senthilkumar S Kathiresan

doi:10.1016/j.patcog.2024.110510

Abstract

Vision transformers (ViTs) achieve remarkable performance on large datasets, but tend to perform worse than convolutional neural networks (CNNs) when trained from scratch on smaller datasets, possibly due to a lack of local inductive bias in the architecture. Recent studies have therefore added locality to the architecture and demonstrated that it can help ViTs achieve performance comparable to CNNs in the small-size dataset regime. Existing methods, however, are architecture-specific or have higher computational and memory costs. Thus, we propose a module called Local InFormation Enhancer (LIFE) that extracts patch-level local information and incorporates it into the embeddings used in the self-attention block of ViTs. Our proposed module is memory and computation efficient, as well as flexible enough to process auxiliary tokens such as the classification and distillation tokens. Empirical results show that the addition of the LIFE module improves the performance of ViTs on small image classification datasets. We further demonstrate how the effect can be extended to downstream tasks, such as object detection and semantic segmentation. In addition, we introduce a new visualization method, Dense Attention Roll-Out, specifically designed for dense prediction tasks, allowing the generation of class-specific attention maps utilizing the attention maps of all tokens. The code for this project is available on Github (https://github.com/NeurAI-Lab/LIFEhttps://github.com/NeurAI-Lab/LIFE).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancing performance of vision transformers on small datasets through local inductive bias incorporation

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition

Lead the way for us

Journal: Pattern Recognition	Publication Date: Apr 21, 2024
Citations: 5

Similar Papers

Supervised Greedy Layer-Wise Training for Deep Convolutional Networks with Small Datasets
Diego Rueda-Plata ... Raúl Ramos-Pollán
-
Diego Rueda-Plata, et. al.Diego Rueda-Plata ... Raúl Ramos-Pollán
01 Jan 2015
01 Jan 2015

Model distillation for high-level semantic understanding：a survey
Ruoyu Sun ... Hongkai Xiong
Journal of Image and Graphics | VOL. 28
Ruoyu Sun, et. al.Ruoyu Sun ... Hongkai Xiong
01 Jan 2023
Journal of Image and Graphics | VOL. 28

Evaluation of transfer learning techniques for classifying small surgical dataset
Shweta Bali ... S S Tyagi
-
Shweta Bali, et. al.Shweta Bali ... S S Tyagi
01 Jan 2020
01 Jan 2020

Small facial image dataset augmentation using conditional GANs based on incomplete edge feature input.
Shih-Kai Hung ... John Q Gan
PeerJ. Computer science | VOL. 7
Shih-Kai Hung, et. al.Shih-Kai Hung ... John Q Gan
17 Nov 2021
PeerJ. Computer science | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing performance of vision transformers on small datasets through local inductive bias incorporation

Abstract

Talk to us

Similar Papers

More From: Pattern Recognition