Abstract

Large-scale labeled datasets are crucial for the success of supervised learning in medical imaging. However, annotating histopathological images is a time-consuming and labor-intensive task that requires highly trained professionals. To address this challenge, self-supervised learning (SSL) can be utilized to pre-train models on large amounts of unsupervised data and transfer the learned representations to various downstream tasks. In this study, we propose a self-supervised Pyramid-based Local Wavelet Transformer (PLWT) model for effectively extracting rich image representations. The PLWT model extracts both local and global features to pre-train a large number of unlabeled histopathology images in a self-supervised manner. Wavelet is used to replace average pooling in the downsampling of the multi-head attention, achieving a significant reduction in information loss during the transmission of image features. Additionally, we introduce a Local Squeeze-and-Excitation (Local SE) module in the feedforward network in combination with the inverse residual to capture local image information. We evaluate PLWT’s performance on three histopathological images and demonstrate the impact of pre-training. Our experiment results indicate that PLWT with self-supervised learning performs highly competitive when compared with other SSL methods, and the transferability of visual representations generated by SSL on domain-relevant histopathological images exceeds that of the supervised baseline trained on ImageNet.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call