Abstract

The development of deep learning models in medical image analysis is majorly limited by the lack of large-sized and well-annotated datasets. Unsupervised learning does not require labels and is more suitable for solving medical image analysis problems. However, most unsupervised learning methods must be applied to large datasets. To make unsupervised learning applicable to small datasets, we proposed Swin MAE, a masked autoencoder with Swin Transformer as its backbone. Even on a dataset of only a few thousand medical images, Swin MAE can still learn useful semantic features purely from images without using any pre-trained models. It can equal or even slightly outperform the supervised model obtained by Swin Transformer trained on ImageNet in the transfer learning results of downstream tasks. Compared to MAE, Swin MAE brought a performance improvement of twice and five times for downstream tasks on BTCV and our parotid dataset, respectively. The code is publicly available at https://github.com/Zian-Xu/Swin-MAE.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.