Pretraining deep convolutional network mappings using natural images helps with medical imaging analysis tasks; this is important given the limited number of clinically-annotated medical images. Many two-dimensional pretrained backbone networks, however, are currently available. This work compared 18 different backbones from 5 architecture groups (pretrained on ImageNet) for the task of assessing [18F]FDG brain Positron Emission Transmission (PET) image quality (reconstructed at seven simulated doses), based on three clinical image quality metrics (global quality rating, pattern recognition, and diagnostic confidence). Using two-dimensional randomly sampled patches, up to eight patients (at three dose levels each) were used for training, with three separate patient datasets used for testing. Each backbone was trained five times with the same training and validation sets, and with six cross-folds. Training only the final fully connected layer (with ~6,000-20,000 trainable parameters) achieved a test mean-absolute-error of ~0.5 (which was within the intrinsic uncertainty of clinical scoring). To compare "classical" and over-parameterized regimes, the pretrained weights of the last 40% of the network layers were then unfrozen. The mean-absolute-error fell below 0.5 for 14 out of the 18 backbones assessed, including two that previously failed to train. Generally, backbones with residual units (e.g. DenseNets and ResNetV2s), were suited to this task, in terms of achieving the lowest mean-absolute-error at test time (~0.45 - 0.5). This proof-of-concept study shows that over-parameterization may also be important for automated PET image quality assessments.
Read full abstract