It is an essential task to accurately diagnose cancer subtypes in computational pathology for personalized cancer treatment. Recent studies have indicated that the combination of multimodal data, such as whole slide images (WSIs) and multi-omics data, could achieve more accurate diagnosis. However, robust cancer diagnosis remains challenging due to the heterogeneity among multimodal data, as well as the performance degradation caused by insufficient multimodal patient data. In this work, we propose a novel multimodal co-attention fusion network (MCFN) with online data augmentation (ODA) for cancer subtype classification. Specifically, a multimodal mutual-guided co-attention (MMC) module is proposed to effectively perform dense multimodal interactions. It enables multimodal data to mutually guide and calibrate each other during the integration process to alleviate inter- and intra-modal heterogeneities. Subsequently, a self-normalizing network (SNN)-Mixer is developed to allow information communication among different omics data and alleviate the high-dimensional small-sample size problem in multi-omics data. Most importantly, to compensate for insufficient multimodal samples for model training, we propose an ODA module in MCFN. The ODA module leverages the multimodal knowledge to guide the data augmentations of WSIs and maximize the data diversity during model training. Extensive experiments are conducted on the public TCGA dataset. The experimental results demonstrate that the proposed MCFN outperforms all the compared algorithms, suggesting its effectiveness.
Read full abstract