Deep learning (DL) CT denoising models have the potential to improve image quality for lower radiation dose exams. These models are generally trained with large quantities of adult patient image data. However, CT, and increasingly DL denoising methods, are used in both adult and pediatric populations. Pediatric body habitus and size can differ significantly from adults and vary dramatically from newborns to adolescents. Ensuring that pediatric subgroups of different body sizes are not disadvantaged by DL methods requires evaluations capable of assessing performance in each subgroup. To assess DL CT denoising in pediatric and adult-sized patients, we built a framework of computer simulated image quality (IQ) control phantoms and evaluation methodology. The computer simulated IQ phantoms in the framework featured pediatric-sized versions of standard CatPhan 600 and MITA-LCD phantoms with a range of diameters matching the mean effective diameters of pediatric patients ranging from newborns to 18 years old. These phantoms were used in simulating CT images that were then inputs for a DL denoiser to evaluate performance in different sized patients. Adult CT test images were simulated using standard-sized phantoms scanned with adult scan protocols. Pediatric CT test images were simulated with pediatric-sized phantoms and adjusted pediatric protocols. The framework's evaluation methodology consisted of denoising both adult and pediatric test images then assessing changes in image quality, including noise, image sharpness, CT number accuracy, and low contrast detectability. To demonstrate the use of the framework, a REDCNN denoising model trained on adult patient images was evaluated. To validate that the DL model performance measured with the proposed pediatric IQ phantoms was representative of performance in more realistic patient anatomy, anthropomorphic pediatric XCAT phantoms of the same age range were also used to compare noise reduction performance. Using the proposed pediatric-sized IQ phantom framework, size differences between adult and pediatric-sized phantoms were observed to substantially influence the adult trained DL denoising model's performance. When applied to adult images, the DL model achieved a 60% reduction in noise standard deviation without substantial loss in sharpness in mid or high spatial frequencies. However, in smaller phantoms the denoising performance dropped due to different image noise textures resulting from the smaller field of view (FOV) between adult and pediatric protocols. In the validation study, noise reduction trends in the pediatric-sized IQ phantoms were found to be consistent with those found in anthropomorphic phantoms. We developed a framework of using pediatric-sized IQ phantoms for pediatric subgroup evaluation of DL denoising models. Using the framework, we found the performance of an adult trained DL denoiser did not generalize well in the smaller diameter phantoms corresponding to younger pediatric patient sizes. Our work suggests noise texture differences from FOV changes between adult and pediatric protocols can contribute to poor generalizability in DL denoising and that the proposed framework is an effective means to identify these performance disparities for a given model.
Read full abstract