The noise in digital breast tomosynthesis (DBT) includes x-ray quantum noise and detector readout noise. The total radiation dose of a DBT scan is kept at about the level of a digital mammogram but the detector noise is increased due to acquisition of multiple projections. The high noise can degrade the detectability of subtle lesions, specifically microcalcifications (MCs). We previously developed a deep-learning-based denoiser to improve the image quality of DBT. In the current study, we conducted an observer performance study with breast radiologists to investigate the feasibility of using deep-learning-based denoising to improve the detection of MCs in DBT. We have a modular breast phantom set containing seven 1-cm-thick heterogeneous 50% adipose/50% fibroglandular slabs custom-made by CIRS, Inc. (Norfolk, VA). We made six 5-cm-thick breast phantoms embedded with 144 simulated MC clusters of four nominal speck sizes (0.125-0.150, 0.150-0.180, 0.180-0.212, 0.212-0.250mm) at random locations. The phantoms were imaged with a GE Pristina DBT system using the automatic standard (STD) mode. The phantoms were also imaged with the STD+ mode that increased the average glandular dose by 54% to be used as a reference condition for comparison of radiologists' reading. Our previously trained and validated denoiser was deployed to the STD images to obtain a denoised DBT set (dnSTD). Seven breast radiologists participated as readers to detect the MCs in the DBT volumes of the six phantoms under the three conditions (STD, STD+, dnSTD), totaling 18 DBT volumes. Each radiologist read all the 18 DBT volumes sequentially, which were arranged in a different order for each reader in a counter-balanced manner to minimize any potential reading order effects. They marked the location of each detected MC cluster and provided a conspicuity rating and their confidence level for the perceived cluster. The visual grading characteristics (VGC) analysis was used to compare the conspicuity ratings and the confidence levels of the radiologists for the detection of MCs. The average sensitivities over all MC speck sizes were 65.3%, 73.2%, and 72.3%, respectively, for the radiologists reading the STD, dnSTD, and STD+ volumes. The sensitivity for dnSTD was significantly higher than that for STD (p<0.005, two-tailed Wilcoxon signed rank test) and comparable to that for STD+. The average false positive rates were 3.9±4.6, 2.8±3.7, and 2.7±3.9 marks per DBT volume, respectively, for reading the STD, dnSTD, and STD+ images but the difference between dnSTD and STD or STD+ did not reach statistical significance. The overall conspicuity ratings and confidence levels by VGC analysis for dnSTD were significantly higher than those for both STD and STD+ (p≤0.001). The critical alpha value for significance was adjusted to be 0.025 with Bonferroni correction. This observer study using breast phantom images showed that deep-learning-based denoising has the potential to improve the detection of MCs in noisy DBT images and increase radiologists' confidence in differentiating noise from MCs without increasing radiation dose. Further studies are needed to evaluate the generalizability of these results to the wide range of DBTs from human subjects and patient populations in clinical settings.
Read full abstract