Abstract

Many convolutional neural network (CNN) based approaches for skin cancer classification primarily rely on dermatological images, yielding commendable results in classification accuracy. However, leveraging patient metadata, a crucial source of clinical information for dermatologists, can further enhance accuracy. Current methodologies predominantly employ basic joint fusion structures (FS) and fusion modules (FMs) for multi-modal classification, leaving room for advancement in enhancing accuracy through exploration of more sophisticated FS and FM architectures. Thus, this paper introduces a novel fusion method that integrates dermatological images (dermoscopy images or clinical images) with patient metadata for skin cancer classification, focusing on enhancing FS and FM components. Initially, we propose a joint-individual fusion (JIF) structure that simultaneously learns shared features across multi-modality data while preserving specific characteristics. Subsequently, we introduce a multi-modal fusion attention (MMFA) module designed to amplify the most relevant image and metadata features through a combination of self and mutual attention mechanisms, thereby bolstering the decision-making pipeline. Our study compares the efficacy of the proposed JIF-MMFA method with other state-of-the-art fusion techniques across three distinct public datasets. Results demonstrate that the JIF-MMFA method consistently enhances classification outcomes across various CNN backbones, outperforming alternative fusion methodologies on all three datasets. These findings underscore the effectiveness and robustness of our proposed approach in skin cancer classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call