Abstract Background: Personalized breast cancer (BC) screening adjusts the imaging modality and frequency of exams according to a woman's risk of developing BC. This can lower cost and false positives by reducing unnecessary exams and has the potential to find more cancers at a curable stage. Deep learning (DL) is a class of artificial intelligence algorithms that progressively extracts higher-level representations from raw input. A critical challenge to applying DL for BC risk prediction is that images are needed from exams performed before a possible cancer diagnosis. Large longitudinal datasets with cancer labeling are relatively scarce. Recently, new self-supervised methods have been developed which do not require labeling. Instead, they learn to recognize higher-level features by comparing two augmented images and determining if they are derived from the same original image. Methods: We developed Self-supervised AI for CAncer Risk Assessment (SAICARA), a mammography-based DL model to predict BC risk. We trained SAICARA on mammograms from the Chicago Multiethnic Epidemiologic Cohort (ChiMEC). We used the momentum contrast method in pretraining to train an encoder that produces compact representations of input mammography views. We initialized the encoders with weights obtained from training on the ImageNet dataset. We continued pretraining with 223,415 chest radiographs from the CheXpert database. Finally, we used mammograms from ChiMEC without any requirements on the exam date. We used augmentations from two different mammography views to provide better positive pairs for self-supervised learning. For fine-tuning, we trained with exams from women who were known to be cancer-free with at least 100 days of follow-up, and patients diagnosed with BC at least 30 days following the exam. Optimization was performed using a negative-log likelihood loss function which was discretized by considering quantiles of the event-time distribution. Hyperparameters were tuned using a Bayesian optimization strategy implemented by Weights and Biases. We computed the concordance index and the area under the receiver-operating characteristic curve (AUC) at two years to evaluate the discriminating capacity of the predicted risk of BC. We evaluated our model using 10-fold cross-validation. Results: In the final phase of pretraining, we used 13,194 mammography exams from 2,835 women. For fine-tuning, we used 4,849 exams from 1,418 women who were known to be cancer-free at their last follow-up, and 1,760 exams from 744 women who had exams that were followed by a BC diagnosis. SAICARA achieved a mean concordance index of 0.62 (standard deviation, SD = 0.11) and a mean AUC of 0.61 (SD = 0.09). Conclusion: Self-supervised DL holds promise as a technique for improving the performance of image-based BC risk prediction models. Citation Format: Anna Woodard, Olasubomi J. Omoleye, Rachna Gupta, Fangyuan Zhao, Aarthi Koripelly, Ian Foster, Kyle Chard, Toshio F. Yoshimatsu, Yonglan Zheng, Dezheng Huo, Olufunmilayo I. Olopade. Self-supervised deep learning to assess breast cancer risk [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 5047.
Read full abstract