Automated assessment of task-based performance of digital mammography and tomosynthesis systems using an anthropomorphic breast phantom and deep learning-based scoring.

Andrey Makeev,Paul Jahnke,Kaiyan Li,Mark A Anastasio,Stephen J Glick,Arthur Emig

doi:10.1117/1.jmi.12.s1.s13005

Abstract

Conventional metrics used for assessing digital mammography (DM) and digital breast tomosynthesis (DBT) image quality, including noise, spatial resolution, and detective quantum efficiency, do not necessarily predict how well the system will perform in a clinical task. A number of existing phantom-based methods have their own limitations, such as unrealistic uniform backgrounds, subjective scoring using humans, and regular signal patterns unrepresentative of common clinical findings. We attempted to address this problem with a realistic breast phantom with random hydroxyapatite microcalcifications and semi-automated deep learning-based image scoring. Our goal was to develop a methodology for objective task-based assessment of image quality for tomosynthesis and DM systems, which includes an anthropomorphic phantom, a detection task (microcalcification clusters), and automated performance evaluation using a convolutional neural network. Experimental 2D and pseudo-3D mammograms of an anthropomorphic inkjet-printed breast phantom with inserted microcalcification clusters were collected on clinical mammography systems to train a signal-present/signal-absent image classifier based on Resnet-18 architecture. In a separate validation study using simulations, this Resnet-18 classifier was shown to approach the performance of an ideal observer. Microcalcification detection performance was evaluated as a function of four dose levels using receiver operating characteristic (ROC) analysis [i.e., area under the ROC curve (AUC)]. To demonstrate the use of this evaluation approach for assessing different technologies, the method was applied to two different mammography systems, as well as to mammograms with re-binned pixels emulating a lower-resolution X-ray detector. Microcalcification detectability, as assessed by the deep learning classifier, was observed to vary with the exposure incident on the breast phantom for both DM and tomosynthesis. At full dose, experimental AUC was 0.96 (for DM) and 0.95 (for DBT), whereas at half dose, it dropped to 0.85 and 0.71, respectively. AUC performance on DM was significantly decreased with an effective larger pixel size obtained with re-binning. The task-based assessment approach also showed the superiority of a newer mammography system compared with an older system. An objective task-based methodology for assessing the image quality of mammography and tomosynthesis systems is proposed. Possible uses for this tool could be quality control, acceptance, and constancy testing, assessing the safety and effectiveness of new technology for regulatory submissions, and system optimization. The results from this study showed that the proposed evaluation method using a deep learning model observer can track differences in microcalcification signal detectability with varied exposure conditions.

Full Text