Low-dose computed tomography (LDCT) reduces radiation exposure, but the introduced noise and artifacts impair its diagnostic accuracy. Convolutional neural networks (CNNs) are widely used for LDCT denoising, but they suffer from a limited receptive field. The use of a larger kernel size can enlarge the receptive field and boost model performance; however, the computational cost of the model greatly increases. We aimed to develop a LDCT denoising CNN with a large receptive field and lower computational complexity. We developed a multi-scale perceptual modulation network (MSPMnet) incorporating a powerful multi-head decomposable convolution (MHDC). To address the high computational complexity of large kernel convolutions, we developed a novel MHDC module that can capture multi-scale features and efficiently expand the receptive field. The MHDC module couples maximum-pooling with three depth-wise convolutions of varying kernel sizes via a channel splitting mechanism, where, unlike conventional CNNs, the two large two-dimensional kernels are each decomposed into a set of cascaded orthogonal one-dimensional kernels to remain lightweight. Further, departing from prior methodologies that apply a uniform kernel size throughout the network, we introduced a receptive field-ramp mechanism that adeptly transitions from local to relatively long-range dependency modeling as the network depth increases, thereby achieving superior performance. The proposed MSPMnet was evaluated on a Mayo Clinic data set with a conventional iterative algorithm, two CNN models, and two Transformer models used for comparison. Compared to the competing baseline methods, the MSPMnet exhibited better performance in both the visual and quantitative assessments. Visually, the MSPMnet preserved the structure, edges, and textures with excellent noise and artifact reduction, generating the denoised images closest to normal-dose computed tomography images. Quantitatively, the MSPMnet had the lowest root mean-square error (RMSE) (8.3094±1.9325) and the highest peak signal-to-noise ratio (PSNR) (33.8525±1.8213 dB), structural similarity index (SSIM) (0.9309±0.0272), and feature similarity index (FSIM) (0.9699±0.0113), demonstrating superior denoising performance. The proposed MSPMnet excelled at LDCT denoising, effectively removing noise and artifacts while preserving edges. Compared to the state-of-the-art CNNs and Transformers, the proposed MSPMnet exhibited superior denoising performance both quantitatively and qualitatively.
Read full abstract