Conventional methods for tumor diagnosis suffer from two inherent limitations: they are time-consuming and subjective. Computer-aided diagnosis (CAD) is an important approach for addressing these limitations. Pathology whole-slide images (WSIs) are high-resolution tissue images that have made significant contributions to cancer diagnosis and prognosis assessment. Due to the complexity of WSIs and the availability of only slide-level labels, multiple instance learning (MIL) has become the primary framework for WSI classification. However, most MIL methods fail to capture the interdependence among image patches within a WSI, which is crucial for accurate classification prediction. Moreover, due to the weak supervision of slide-level labels, overfitting may occur during the training process. To address these issues, this paper proposes a dual-attention-based multiple instance learning framework (DAMIL). DAMIL leverages the spatial relationships and channel information between WSI patches for classification prediction, without detailed pixel-level tumor annotations. The output of the model preserves the semantic variations in the latent space, enhances semantic disturbance invariance, and provides reliable class identification for the final slide-level representation. We validate the effectiveness of DAMIL on the most commonly used public dataset, Camelyon16. The results demonstrate that DAMIL outperforms the state-of-the-art methods in terms of classification accuracy (ACC), area under the curve (AUC), and F1-Score. Our model also allows for the examination of its interpretability by visualizing the dual-attention weights. To the best of our knowledge, this is the first attempt to use a dual-attention mechanism, considering both spatial and channel information, for whole-slide image classification.
Read full abstract