Discordant and false-negative interpretations at digital breast tomosynthesis in the prospective Oslo Tomosynthesis Screening Trial (OTST) using independent double reading.

Ellen B Eben,Randi Gullien,Per Skaane,Stanimir Yanakiev,Bjørn Helge Østerås,Siri H B Brandal,Terese Lie

doi:10.1007/s00330-023-10400-0

Ellen B Eben, Randi Gullien + Show 5 more

Open Access

https://doi.org/10.1007/s00330-023-10400-0

Copy DOI

Abstract

To analyze discordant and false-negatives of double reading digital breast tomosynthesis (DBT) versus digital mammography (DM) including reading times in the Oslo Tomosynthesis Screening Trial (OTST), and reclassify these in a retrospective reader study as missed, minimal sign, or true-negatives. The prospective OTST comparing double reading DBT vs. DM had paired design with four parallel arms: DM, DM + computer aided detection, DBT + DM, and DBT + synthetic mammography. Eight radiologists interpreted images in batches using a 5-point scale. Reading time was automatically recorded. A retrospective reader study including four radiologists classified screen-detected cancers with at least one false-negative score and screening examinations of interval cancers as negative, non-specific minimal sign, significant minimal sign, and missed; the two latter groups are defined "actionable." Statistics included chi-square, Fisher's exact, McNemar's, and Mann-Whitney U tests. Discordant rate (cancer missed by one reader) for screen-detected cancers was overall comparable (DBT (31% [71/227]) and DM (30% [52/175]), p= .81), significantly lower at DBT for spiculated cancers (DBT, 19% [20/106] vs. DM, 36% [38/106], p= .003), but high (28/49 = 57%, p= 0.001) for DBT-only detected spiculated cancers. Reading time and sensitivity varied among readers. False-negative DBT-only detected spiculated cancers had shorter reading time than true-negatives in 46% (13/28). Retrospective evaluation classified the following DBT exams "actionable": three missed by both readers, 95% (39/41) of discordant cancers detected by both modes, all 30 discordant DBT-only cancers, 25% (13/51) of interval cancers. Discordant rate was overall comparable for DBT and DM, significantly lower at DBT for spiculated cancers, but high for DBT-only detected spiculated lesions. Most false-negative screen-detected DBT were classified as "actionable." Retrospective evaluation of false-negative interpretations from the Oslo Tomosynthesis Screening Trial shows that most discordant and several interval cancers could have been detected at screening. This underlines the potential for modern AI-based reading aids and triage, as high-volume screening is a demanding task. • Digital breast tomosynthesis (DBT) screening is more sensitive and has higher specificity compared to digital mammography screening, but high-volume DBT screening is a demanding task which can result in high discordance rate among readers. • Independent double reading DBT screening had overall comparable discordance rate as digital mammography, lower for spiculated masses seen on both modalities, and higher for small spiculated cancer seen only on DBT. • Almost all discordant digital breast tomosynthesis-detected cancers (72 of 74) and 25% (13 of 51) of the interval cancers in the Oslo Tomosynthesis Screening Trial were retrospectively classified as actionable and could have been detected by the readers.

Full Text