Abstract
This paper introduces a novel approach to enhance content-based image retrieval, validated on two benchmark datasets: ISIC-2017 and ISIC-2018. These datasets comprise skin lesion images that are crucial for innovations in skin cancer diagnosis and treatment. We advocate the use of pre-trained Vision Transformer (ViT), a relatively uncharted concept in the realm of image retrieval, particularly in medical scenarios. In contrast to the traditionally employed Convolutional Neural Networks (CNNs), our findings suggest that ViT offers a more comprehensive understanding of the image context, essential in medical imaging. We further incorporate a weighted multi-loss function, delving into various losses such as triplet loss, distillation loss, contrastive loss, and cross-entropy loss. Our exploration investigates the most resilient combination of these losses to create a robust multi-loss function, thus enhancing the robustness of the learned feature space and ameliorating the precision and recall in the retrieval process. Instead of using all the loss functions, the proposed multi-loss function utilizes the combination of only cross-entropy loss, triplet loss, and distillation loss and gains improvement of 6.52% and 3.45% for mean average precision over ISIC-2017 and ISIC-2018. Another innovation in our methodology is a two-branch network strategy, which concurrently boosts image retrieval and classification. Through our experiments, we underscore the effectiveness and the pitfalls of diverse loss configurations in image retrieval. Furthermore, our approach underlines the advantages of retrieval-based classification through majority voting rather than relying solely on the classification head, leading to enhanced prediction for melanoma - the most lethal type of skin cancer. Our results surpass existing state-of-the-art techniques on the ISIC-2017 and ISIC-2018 datasets by improving mean average precision by 1.01% and 4.36% respectively, emphasizing the efficacy and promise of Vision Transformers paired with our tailor-made weighted loss function, especially in medical contexts. The proposed approach's effectiveness is substantiated through thorough ablation studies and an array of quantitative and qualitative outcomes. To promote reproducibility and support forthcoming research, our source code will be accessible on GitHub.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.