Semi-supervised vision transformer with adaptive token sampling for breast cancer classification.

Wei Wang,Zhifeng Xiao,Feng Yuan,Ning Cui,Qian Li,Ran Jiang

doi:10.3389/fphar.2022.929755

Abstract

Various imaging techniques combined with machine learning (ML) models have been used to build computer-aided diagnosis (CAD) systems for breast cancer (BC) detection and classification. The rise of deep learning models in recent years, represented by convolutional neural network (CNN) models, has pushed the accuracy of ML-based CAD systems to a new level that is comparable to human experts. Existing studies have explored the usage of a wide spectrum of CNN models for BC detection, and supervised learning has been the mainstream. In this study, we propose a semi-supervised learning framework based on the Vision Transformer (ViT). The ViT is a model that has been validated to outperform CNN models on numerous classification benchmarks but its application in BC detection has been rare. The proposed method offers a custom semi-supervised learning procedure that unifies both supervised and consistency training to enhance the robustness of the model. In addition, the method uses an adaptive token sampling technique that can strategically sample the most significant tokens from the input image, leading to an effective performance gain. We validate our method on two datasets with ultrasound and histopathology images. Results demonstrate that our method can consistently outperform the CNN baselines for both learning tasks. The code repository of the project is available at https://github.com/FeiYee/Breast-area-TWO.

Full Text