Abstract Motivation The classification task based on whole-slide images (WSIs) is a classic problem in computational pathology. Multiple Instance Learning (MIL) provides a robust framework for analyzing whole slide images with slide-level labels at gigapixel resolution. However, existing MIL models typically focus on modeling the relationships between instances while neglecting the variability across the channel dimensions of instances, which prevents the model from fully capturing critical information in the channel dimension. Results To address this issue, we propose a plug-and-play module called Multi-scale Channel Attention Block (MCAB), which models the interdependencies between channels by leveraging local features with different receptive fields. By alternately stacking four layers of Transformer and MCAB, we designed a channel attention based MIL model (CAMIL) capable of simultaneously modeling both inter-instance relationships and intra-channel dependencies. To verify the performance of the proposed CAMIL in classification tasks, several comprehensive experiments were conducted across three datasets: Camelyon16, TCGA-NSCLC, and TCGA-RCC. Empirical results demonstrate that, whether the feature extractor is pretrained on natural images or on WSIs, our CAMIL surpasses current state-of-the-art MIL models across multiple evaluation metrics. Availability All implementation code is available at https://github.com/maojy0914/CAMIL Supplementary information Supplementary data are available at Bioinformatics online.
Read full abstract