Abstract

The advancement in medical data collection technology has propelled an increase in demand for modeling cardiac physiological signals. However, current research primarily focuses on unimodal signals, leaving a gap in the study of more comprehensive multimodal signals. Directly applying late fusion specific to modalities or early fusion mixing modalities fails to adequately capture crossmodal information. This paper proposes an optional multimodal CNN-enhanced Transformer fusion network based on multiscale receptive fields. Introducing a switching modal experts for stage-wise representation, the first stage excavates modality-specific features and balances intermodal relationships, while the second stage captures crossmodal interaction information in a shared latent space, promoting deep modality fusion. Due to the flexibility of the switching modal experts, the model can be applied not only to multimodal data but also to unimodal data. Additionally, to address the performance disparity between Transformers and Convolutional Neural Networks (CNN), we combine the advantages of CNN to construct a CNN-enhanced Transformer. Specifically, improving patch embedding introduces multiscale receptive fields and integrates convolution and residual connections into the feed forward network (FFN) to assist the FFN in learning complex non-linear features and aggregating local features. Experimental results demonstrate that our model achieves outstanding performance in both unimodal and multimodal modes across different datasets, surpassing a range of CNNs, Transformers, and CNN-Transformer hybrid networks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.