Abstract
Medical data have unique specificity and professionalism, requiring substantial domain expertise for their annotation. Precise data annotation is essential for anomaly-detection tasks, making the training process complex. Domain generalization (DG) is an important approach to enhancing medical image anomaly detection (AD). This paper introduces a novel multimodal anomaly-detection framework called MedicalCLIP. MedicalCLIP utilizes multimodal data in anomaly-detection tasks and establishes irregular constraints within modalities for images and text. The key to MedicalCLIP lies in learning intramodal detailed representations, which are combined with text semantic-guided cross-modal contrastive learning, allowing the model to focus on semantic information while capturing more detailed information, thus achieving more fine-grained anomaly detection. MedicalCLIP relies on GPT prompts to generate text, reducing the demand for professional descriptions of medical data. Text construction for medical data helps to improve the generalization ability of multimodal models for anomaly-detection tasks. Additionally, during the text-image contrast-enhancement process, the model's ability to select and extract information from image data is improved. Through hierarchical contrastive loss, fine-grained representations are achieved in the image-representation process. MedicalCLIP has been validated on various medical datasets, showing commendable domain generalization performance in medical-data anomaly detection. Improvements were observed in both anomaly classification and segmentation metrics. In the anomaly classification (AC) task involving brain data, the method demonstrated a 2.81 enhancement in performance over the best existing approach.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.