Deep learning algorithm performance in mammography screening: A systematic review and meta-analysis.

Rosimeire Aparecida Roela,Carlos Shimizu,Gabriel Vansuita Valente,Daniel Gustavo Pellacani Petrini,Maria A A Koike Folgueira,Rossana Veronica Mendoza Lopez,Maria Lucia Hirata Katayama,Guilherme Koike Folgueira,Hae Yong Kim,Guilherme Apolinario Silva Novaes,Pedro Adolpho De Menezes Pacheco Serio,Tatiana Cardoso De Mello Tucunduva,Guilherme Nader Marta,Koichi Sameshima

doi:10.1200/jco.2021.39.15_suppl.e13553

Abstract

e13553 Background: Mammography interpretation presents some challenges however, better technological approaches have allowed increased accuracy in cancer diagnosis and nowadays, radiologists sensitivity and specificity for mammography screening vary from 84.5 to 90.6 and 89.7 to 92.0%, respectively. Since its introduction in breast image analysis, artificial intelligence (AI) has rapidly improved and deep learning methods are gaining relevance as a companion tool to radiologists. Thus, the aim of this systematic review and meta analysis was to evaluate the sensitivity and specificity of AI deep learning algorithms and radiologists for breast cancer detection through mammography. Methods: A systematic review was performed using PubMed and the words: deep learning or convolutional neural network and mammography or mammogram, from January 2015 to October 2020. All titles and abstracts were doubly checked; duplicate studies and studies in languages other than English were excluded. The remaining complete studies were doubly assessed and those with specificity and sensibility information had data collected. For the meta analysis, studies reporting specificity, sensitivity and confidence intervals were selected. Heterogeneity measures were calculated using Cochran Q test (chi-square test) and the I2 (percentage of variation). Sensitivity and specificity and 95% confidence intervals (CI) values were calculated, using Stata/MP 14.0 for Windows. Results: Among 223 studies, 66 were selected for full paper analysis and 24 were selected for data extraction. Subsequently, only papers evaluating sensitivity, especificity, CI and/or AUC were analyzed. Eleven studies compared AUC using AI with another method and for these studies, a differential AUC was calculated, however no differences were observed: AI vs Reader (n = 3; p = 0.109); AI vs AI (n = 5; p = 0.225); AI vs AI + reader (n = 2; p = 0.180); AI + Reader vs reader (n = 2; p = 0.655); AI vs reader (n > 1) (n = 3; p = 0.102). Some studies had more than one comparison. A meta analysis was performed to evaluate sensitivity and specificity of the methods. Five studies were included in this analysis and a great heterogeneity among them was observed. There were studies evaluating more than one AI algorithm and studies comparing AI with readers alone or in combination with AI. Sensitivity for AI; AI + reader; reader alone, were 76.08; 84.02; 80.91, respectively. Specificity for AI; AI + reader; reader alone, were 96.62; 85.67; 84.89, respectively. Results are shown in the table. Conclusions: Although recent improvements in AI algorithms for breast cancer screening, a delta AUC between comparisons of AI algorithms and readers was not observed.[Table: see text]

Full Text