The clinical application of breast ultrasound for the assessment of cancer risk and of deep learning for the classification of breast-ultrasound images has been hindered by inter-grader variability and high false positive rates and by deep-learning models that do not follow Breast Imaging Reporting and Data System (BI-RADS) standards, lack explainability features and have not been tested prospectively. Here, we show that an explainable deep-learning system trained on 10,815 multimodal breast-ultrasound images of 721 biopsy-confirmed lesions from 634 patients across two hospitals and prospectively tested on 912 additional images of 152 lesions from 141 patients predicts BI-RADS scores for breast cancer as accurately as experienced radiologists, with areas under the receiver operating curve of 0.922 (95% confidence interval (CI) = 0.868-0.959) for bimodal images and 0.955 (95% CI = 0.909-0.982) for multimodal images. Multimodal multiview breast-ultrasound images augmented with heatmaps for malignancy risk predicted via deep learning may facilitate the adoption of ultrasound imaging in screening mammography workflows.
Read full abstract