Illumination consistency is a key factor for seamlessly integrating virtual objects with real scenes in augmented reality (AR) systems. High dynamic range (HDR) panoramic images are widely used to estimate scene lighting accurately. However, generating environment maps requires complex deep network architectures, which cannot operate on devices with limited memory space. To address this issue, this paper proposes CGLight, an effective illumination estimation method that predicts HDR panoramic environment maps from a single limited field-of-view (LFOV) image. We first design a CMAtten encoder to extract features from input images, which learns the spherical harmonic (SH) lighting representation with fewer model parameters. Guided by the lighting parameters, we train a generative adversarial network (GAN) to generate HDR environment maps. In addition, to enrich lighting details and reduce training time, we specifically introduce the color consistency loss and independent discriminator, considering the impact of color properties on the lighting estimation task while improving computational efficiency. Furthermore, the effectiveness of CGLight is verified by relighting virtual objects using the predicted environment maps, and the root mean square error and angular error are 0.0494 and 4.0607 in the gray diffuse sphere, respectively. Extensive experiments and analyses demonstrate that CGLight achieves a balance between indoor illumination estimation accuracy and resource efficiency, attaining higher accuracy with nearly 4 times fewer model parameters than the ViT-B16 model.
Read full abstract