Compressed Sensing (CS) is a groundbreaking paradigm in image acquisition, challenging the constraints of the Nyquist–Shannon sampling theorem. This enables high-quality image reconstruction using a minimal number of measurements. Neural Networks’ potent feature induction capabilities enable advanced data-driven CS methods to achieve high-fidelity image reconstruction. However, achieving satisfactory reconstruction performance, particularly in terms of perceptual quality, remains challenging at extremely low sampling rates. To tackle this challenge, we introduce a novel two-stage image CS framework based on latent diffusion, named LD-CSNet. In the first stage, we utilize an autoencoder pre-trained on a large dataset to represent natural images as low-dimensional latent vectors, establishing prior knowledge distinct from sparsity and effectively reducing the dimensionality of the solution space. In the second stage, we employ a conditional diffusion model for maximum likelihood estimates in the latent space. This is supported by a measurement embedding module designed to encode measurements, making them suitable for a denoising network. This guides the generation process in reconstructing low-dimensional latent vectors. Finally, the image is reconstructed using a pre-trained decoder. Experimental results across multiple public datasets demonstrate LD-CSNet’s superior perceptual quality and robustness to noise. It maintains fidelity and visual quality at lower sampling rates. Research findings suggest the promising application of diffusion models in image CS. Future research can focus on developing more appropriate models for the first stage.
Read full abstract