To develop and test AI-integrated biopsy avoidance strategies to improve the specificity of screening breast ultrasound (US). This retrospective study included consecutive asymptomatic women with BI-RADS 3, 4a, 4b, 4c, or 5 masses on screening breast US exams acquired from two hospitals between December 2019 and December 2020 (development cohort) and June 2020 and December 2020 (external validation cohort). If more than one lesion was present, the most suspicious lesion was analyzed. Logistic regression was used to develop the AI-integrated biopsy avoidance strategies in which BI-RADS 4a masses were downgraded to BI-RADS 3 if the AI classifications were "both planes benign" in all women or "benign and malignant" in the women ≤ 45 years of age. Diagnostic performance metrics were calculated for both cohorts and compared to initial assessments by radiologists using the Wilcoxon rank-sum test for noninferiority of sensitivity (relative noninferiority margin, 5%) and the McNemar test for specificity. The development and external validation cohorts consisted of 393 women (median age, 45 years [IQR, 40-50 years]) with 101 malignancies and 166 women (median age, 47 years [IQR, 42-51 years]) with 31 malignancies, respectively. The developed strategy improved specificity from 53.3% (72/135; 95% CI: 45.0, 62.1) to 80.7% (109/135; [95% CI: 74.2, 87.5]; p < 0.001) while maintaining sensitivity (both 100% [31/31; 95% CI: 98.9, 100]), and would have avoided 61.7% (37/60 [95% CI: 48.2, 73.7]) of benign biopsies of BI-RADS 4a masses in the external validation cohort. A strategy integrating AI classification in two orthogonal planes, age, and BI-RADS classification improved the specificity of screening breast US while maintaining non-inferior sensitivity. Question How can integrating AI lesion classification, age, and BI-RADS assessment effectively reduce benign biopsies in screening breast ultrasound? Findings A strategy integrating AI classifications, age, and BI-RADS using multivariable logistic regression improved specificity while maintaining non-inferior sensitivity in breast ultrasound screening. Clinical relevance The integration of AI classification in two orthogonal planes, along with patient age and BI-RADS classification, shows potential for reducing benign breast biopsies without compromising sensitivity, leading to more efficient clinical decision-making, reduced patient anxiety, and decreased healthcare resource utilization.
Read full abstract