Abstract

Purpose To compare breast cancer detection performance of radiologists reading mammographic examinations unaided versus supported by an artificial intelligence (AI) system. Materials and Methods An enriched retrospective, fully crossed, multireader, multicase, HIPAA-compliant study was performed. Screening digital mammographic examinations from 240 women (median age, 62 years; range, 39-89 years) performed between 2013 and 2017 were included. The 240 examinations (100 showing cancers, 40 leading to false-positive recalls, 100 normal) were interpreted by 14 Mammography Quality Standards Act-qualified radiologists, once with and once without AI support. The readers provided a Breast Imaging Reporting and Data System score and probability of malignancy. AI support provided radiologists with interactive decision support (clicking on a breast region yields a local cancer likelihood score), traditional lesion markers for computer-detected abnormalities, and an examination-based cancer likelihood score. The area under the receiver operating characteristic curve (AUC), specificity and sensitivity, and reading time were compared between conditions by using mixed-models analysis dof variance and generalized linear models for multiple repeated measurements. Results On average, the AUC was higher with AI support than with unaided reading (0.89 vs 0.87, respectively; P = .002). Sensitivity increased with AI support (86% [86 of 100] vs 83% [83 of 100]; P = .046), whereas specificity trended toward improvement (79% [111 of 140]) vs 77% [108 of 140]; P = .06). Reading time per case was similar (unaided, 146 seconds; supported by AI, 149 seconds; P = .15). The AUC with the AI system alone was similar to the average AUC of the radiologists (0.89 vs 0.87). Conclusion Radiologists improved their cancer detection at mammography when using an artificial intelligence system for support, without requiring additional reading time. Published under a CC BY 4.0 license. See also the editorial by Bahl in this issue.

Highlights

  • The sample size and examination type distribution for our observer evaluation study population were estimated on the basis of the results of a similar previous study [18], by using the unified method proposed by Hillis et al [20], to yield a study power greater than 0.8

  • This resulted in a target data set of 240 digital mammographic examinations (100 showing cancer, 40 with falsepositive results, and 100 with normal results)

  • The changes in AUC ranged from 0.0 to 0.05 and were higher with AI support for 12 of the 14 radiologists

Read more

Summary

Introduction

The sample size and examination type distribution for our observer evaluation study population were estimated on the basis of the results of a similar previous study [18], by using the unified method proposed by Hillis et al [20], to yield a study power greater than 0.8. This resulted in a target data set of 240 digital mammographic examinations (100 showing cancer, 40 with falsepositive results, and 100 with normal results)

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call