Abstract

Multimodal deep learning models have been applied for disease prediction tasks, but difficulties exist in training due to the conflict between sub-models and fusion modules. To alleviate this issue, we propose a framework for decoupling feature alignment and fusion (DeAF), which separates the multimodal model training into two stages. In the first stage, unsupervised representation learning is conducted, and the modality adaptation (MA) module is used to align the features from various modalities. In the second stage, the self-attention fusion (SAF) module combines the medical image features and clinical data using supervised learning. Moreover, we apply the DeAF framework to predict the postoperative efficacy of CRS for colorectal cancer and whether the MCI patients change to Alzheimer’s disease. The DeAF framework achieves a significant improvement in comparison to the previous methods. Furthermore, extensive ablation experiments are conducted to demonstrate the rationality and effectiveness of our framework. In conclusion, our framework enhances the interaction between the local medical image features and clinical data, and derive more discriminative multimodal features for disease prediction. The framework implementation is available at https://github.com/cchencan/DeAF.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call