Abstract

Electrocardiograms (ECGs) and phonocardiograms (PCGs) are two modalities to provide complementary diagnostic information for improving the early detection accuracy of cardiovascular diseases (CVDs). Existing multi-modality methods mainly used the early or late feature fusion strategy, which did not simultaneously utilize the complementary information contained in low-level detail features and high-level semantic features of different modalities. Meanwhile, they were specially designed for the multi-modality scenario with both ECGs and PCGs, without considering the missing-modality scenarios with only ECGs or PCGs in clinical practice. To address these challenges, we developed a Co-learning-assisted Progressive Dense fusion network (CPDNet) for end-to-end CVD detection, with a three-branch interweaving architecture consisting of ECG and PCG modality-specific encoders and a progressive dense fusion encoder, which could be used for both multi-modality and missing-modality scenarios. Specifically, we designed a novel progressive dense fusion strategy, which not only progressively fused multi-level complementary information of different modalities from low-level details to high-level semantics, but also employed the dense fusion during feature fusion at each level to further enrich available multi-modality information through mutual guidance of features at different levels. Meanwhile, the strategy integrated cross-modality region-aware and multi-scale feature optimization modules to fully evaluate the contributions of different modalities and signal regions and enhance the feature extraction ability of the network for multi-scale target regions. Moreover, we designed a novel co-learning strategy to guide the learning process of the CPDNet by combining intra-modality and joint losses, which made each encoder well-trained. This strategy could not only assist our fusion strategy by making modality-specific encoders provide sufficiently discriminative features for the fusion encoder, but also enable the CPDNet to robustly handle missing-modality scenarios by independently using the corresponding modality-specific encoder. Experimental results on public and private datasets demonstrated that our method not only outperformed state-of-the-art multi-modality methods by at least 5.05% for average accuracy in the multi-modality scenario, but also achieved better performance than single-modality models in the missing-modality scenarios.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.